CSV Files In Python — PBA Institute Tutorial
Chapter 21 · Python Programming Series
12 min read Beginner

Reading & Writing CSV Files

CSV (Comma-Separated Values) is the simplest, most universal data format on the planet — every spreadsheet tool, database, and analytics platform speaks it. Python's built-in csv module makes reading and writing CSVs trivial, while pandas takes it to industrial strength.

Overview

📊

Universal Format

Compatible with Excel, databases, and every data tool.

📝

Human-Readable

Plain text — easy to inspect with any editor.

🔁

Streaming

Read row-by-row — handle huge CSVs without memory issues.

🗂️

Dict Style

DictReader gives column-name access — clean and self-documenting.

🐼

Pandas Power

pandas turns CSV into a queryable, plottable DataFrame instantly.

Syntax

  • Import the module: import csv.
  • Read with csv.reader(file) or csv.DictReader(file) for dictionaries.
  • Write with csv.writer(file) or csv.DictWriter(file, fieldnames=...).
  • Always open CSV files with newline="" to avoid blank lines on Windows.
CSV Files — Syntax
import csv

# write
rows = [["name", "age"], ["Riya", 22], ["Sam", 19]]
with open("people.csv", "w", newline="", encoding="utf-8") as f:
    csv.writer(f).writerows(rows)

# read as lists
with open("people.csv", encoding="utf-8") as f:
    for row in csv.reader(f):
        print(row)

# read as dicts
with open("people.csv", encoding="utf-8") as f:
    for rec in csv.DictReader(f):
        print(rec["name"], rec["age"])

Detailed Explanation

  • Why CSV?: CSV is plain text — human-readable, easy to debug, and supported by Excel, Google Sheets, SQLite, pandas, and virtually every analytics tool.
  • csv.reader: Reads rows as lists of strings. Quoted fields, embedded commas, and escaped quotes are handled for you.
  • csv.writer: Writes a list (or list of lists) to a file. Use writerow for one row and writerows for many.
  • DictReader & DictWriter: Treat each row as a dictionary keyed by header names. Much more readable than tracking column indices.
  • Delimiters & quoting: Use delimiter='\t' for TSV, delimiter=';' for European CSVs. quoting=csv.QUOTE_ALL quotes every field.
  • pandas for big data: pd.read_csv("file.csv") reads megabytes in one line and returns a DataFrame with columns, dtypes, and indexing.

Code Examples

Example 1 — Write a CSV (list of lists)
import csv
rows = [["name", "marks"], ["Riya", 92], ["Sam", 87], ["Anu", 78]]
with open("marks.csv", "w", newline="", encoding="utf-8") as f:
    csv.writer(f).writerows(rows)
print("Done")
Output Done
Example 2 — Read a CSV with csv.reader
import csv
with open("marks.csv", encoding="utf-8") as f:
    for row in csv.reader(f):
        print(row)
Output ['name', 'marks']
['Riya', '92']
['Sam', '87']
['Anu', '78']
Example 3 — Read CSV as Dictionaries
import csv
with open("marks.csv", encoding="utf-8") as f:
    for rec in csv.DictReader(f):
        print(rec["name"], "->", rec["marks"])
Output Riya -> 92
Sam -> 87
Anu -> 78
Example 4 — Write with DictWriter
import csv
data = [{"name":"Riya","age":22}, {"name":"Sam","age":19}]
with open("dw.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=["name", "age"])
    w.writeheader()
    w.writerows(data)
print("Saved")
Output Saved
Example 5 — Compute Average Marks from CSV
import csv
total = count = 0
with open("marks.csv", encoding="utf-8") as f:
    for rec in csv.DictReader(f):
        total += int(rec["marks"])
        count += 1
print("Average:", total / count)
Output Average: 85.67
Example 6 — Read CSV with Pandas
import pandas as pd
df = pd.read_csv("marks.csv")
print(df)
print("Mean marks:", df["marks"].mean())
Output name marks
0 Riya 92
1 Sam 87
2 Anu 78
Mean marks: 85.67

Real-World Use Cases

Analytics

Data scientists live in CSV → pandas → plot pipelines.

Finance

Banks export statements as CSV for reconciliation.

E-commerce

Bulk product uploads and download reports use CSV.

HR Systems

Employee data exports for payroll processing.

Open Data

Government datasets are distributed as CSV worldwide.

Scientific Data

Lab equipment exports measurements as time-stamped CSVs.

Notes & Pro Tips

  • Always pass newline="" when opening CSVs to avoid extra blank lines on Windows.
  • Use encoding="utf-8" consistently — non-ASCII names corrupt without it.
  • Use DictReader/DictWriter for self-documenting code.
  • For European data use delimiter=';' — many spreadsheets use semicolons.
  • Large CSVs: stream row by row instead of list(reader).
  • For complex analytics, pandas is far more powerful than the built-in module.

Common Mistakes

  • Wrong newline mode: Windows users get blank rows without newline="".
  • Manual splitting on commas: fails when fields contain commas inside quotes. Use the csv module.
  • Integer vs string: CSV cells are strings — convert with int() / float().
  • Loading huge files into memory: use streaming or pandas with chunksize.
  • Forgetting header: DictReader needs a header row to map columns.
  • Encoding mismatch: opening a UTF-8 CSV as ASCII crashes on non-English text.

Practice Problems

  • Problem 1: Create a CSV file of 10 students with name, age, marks.
  • Problem 2: Read the CSV and print the topper based on marks.
  • Problem 3: Append a new student record to an existing CSV.
  • Problem 4: Convert a list of dicts (each representing a student) into a CSV using DictWriter.
  • Problem 5: Read a CSV and write a new CSV containing only students with marks above 80.
  • Problem 6: Use pandas to load a CSV and plot a bar chart of marks by name.

Interview Questions

  • Q1. Why prefer the csv module over manual string splitting?
  • Q2. Difference between csv.reader and csv.DictReader?
  • Q3. What is the role of newline="" in open()?
  • Q4. How do you handle CSV files where fields contain commas or quotes?
  • Q5. How would you process a 5 GB CSV file in Python efficiently?
  • Q6. When would you choose pandas over the built-in csv module?

Frequently Asked Questions

  • Q1: Why use the csv module instead of splitting on commas?
    The csv module correctly handles quoted fields, escaped quotes, and embedded commas — manual splitting cannot.
  • Q2: Why do I see blank lines on Windows?
    Open the file with newline=''. Without it, Windows adds extra carriage returns.
  • Q3: What is DictReader?
    An iterator that returns each row as a dictionary keyed by the header row, making column access by name simple.
  • Q4: How do I read a CSV with a custom delimiter?
    Pass delimiter='\t', delimiter=';', or any character via the delimiter argument to csv.reader/writer.
  • Q5: Is csv module faster than pandas?
    For pure I/O, comparable. Pandas is faster for analytics because operations are vectorised in C.
  • Q6: How do I handle CSVs with millions of rows?
    Stream row by row, use generators, or use pandas with chunksize=10000.

Summary

CSV is the universal language of tabular data. Python's built-in csv module handles the format flawlessly — and DictReader/DictWriter make your code self-documenting. For analytics workloads, switch to pandas and your CSV reading becomes part of a much larger toolkit. Either way, you can move data between Python and the rest of the world effortlessly.

Continue Learning

Previous

Go to Previous Chapter

Next

Go to Next Chapter