Reading & Writing CSV Files
CSV (Comma-Separated Values) is the simplest, most universal data format on the planet — every spreadsheet tool, database, and analytics platform speaks it. Python's built-in csv module makes reading and writing CSVs trivial, while pandas takes it to industrial strength.
Overview
Universal Format
Compatible with Excel, databases, and every data tool.
Human-Readable
Plain text — easy to inspect with any editor.
Streaming
Read row-by-row — handle huge CSVs without memory issues.
Dict Style
DictReader gives column-name access — clean and self-documenting.
Pandas Power
pandas turns CSV into a queryable, plottable DataFrame instantly.
Syntax
- Import the module:
import csv. - Read with
csv.reader(file)orcsv.DictReader(file)for dictionaries. - Write with
csv.writer(file)orcsv.DictWriter(file, fieldnames=...). - Always open CSV files with
newline=""to avoid blank lines on Windows.
import csv
# write
rows = [["name", "age"], ["Riya", 22], ["Sam", 19]]
with open("people.csv", "w", newline="", encoding="utf-8") as f:
csv.writer(f).writerows(rows)
# read as lists
with open("people.csv", encoding="utf-8") as f:
for row in csv.reader(f):
print(row)
# read as dicts
with open("people.csv", encoding="utf-8") as f:
for rec in csv.DictReader(f):
print(rec["name"], rec["age"])
Detailed Explanation
- Why CSV?: CSV is plain text — human-readable, easy to debug, and supported by Excel, Google Sheets, SQLite, pandas, and virtually every analytics tool.
- csv.reader: Reads rows as lists of strings. Quoted fields, embedded commas, and escaped quotes are handled for you.
- csv.writer: Writes a list (or list of lists) to a file. Use
writerowfor one row andwriterowsfor many. - DictReader & DictWriter: Treat each row as a dictionary keyed by header names. Much more readable than tracking column indices.
- Delimiters & quoting: Use
delimiter='\t'for TSV,delimiter=';'for European CSVs.quoting=csv.QUOTE_ALLquotes every field. - pandas for big data:
pd.read_csv("file.csv")reads megabytes in one line and returns a DataFrame with columns, dtypes, and indexing.
Code Examples
import csv
rows = [["name", "marks"], ["Riya", 92], ["Sam", 87], ["Anu", 78]]
with open("marks.csv", "w", newline="", encoding="utf-8") as f:
csv.writer(f).writerows(rows)
print("Done")
import csv
with open("marks.csv", encoding="utf-8") as f:
for row in csv.reader(f):
print(row)
['Riya', '92']
['Sam', '87']
['Anu', '78']
import csv
with open("marks.csv", encoding="utf-8") as f:
for rec in csv.DictReader(f):
print(rec["name"], "->", rec["marks"])
Sam -> 87
Anu -> 78
import csv
data = [{"name":"Riya","age":22}, {"name":"Sam","age":19}]
with open("dw.csv", "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=["name", "age"])
w.writeheader()
w.writerows(data)
print("Saved")
import csv
total = count = 0
with open("marks.csv", encoding="utf-8") as f:
for rec in csv.DictReader(f):
total += int(rec["marks"])
count += 1
print("Average:", total / count)
import pandas as pd
df = pd.read_csv("marks.csv")
print(df)
print("Mean marks:", df["marks"].mean())
0 Riya 92
1 Sam 87
2 Anu 78
Mean marks: 85.67
Real-World Use Cases
Analytics
Data scientists live in CSV → pandas → plot pipelines.
Finance
Banks export statements as CSV for reconciliation.
E-commerce
Bulk product uploads and download reports use CSV.
HR Systems
Employee data exports for payroll processing.
Open Data
Government datasets are distributed as CSV worldwide.
Scientific Data
Lab equipment exports measurements as time-stamped CSVs.
Notes & Pro Tips
- Always pass
newline=""when opening CSVs to avoid extra blank lines on Windows. - Use
encoding="utf-8"consistently — non-ASCII names corrupt without it. - Use
DictReader/DictWriterfor self-documenting code. - For European data use
delimiter=';'— many spreadsheets use semicolons. - Large CSVs: stream row by row instead of
list(reader). - For complex analytics,
pandasis far more powerful than the built-in module.
Common Mistakes
- Wrong newline mode: Windows users get blank rows without
newline="". - Manual splitting on commas: fails when fields contain commas inside quotes. Use the csv module.
- Integer vs string: CSV cells are strings — convert with
int()/float(). - Loading huge files into memory: use streaming or pandas with
chunksize. - Forgetting header:
DictReaderneeds a header row to map columns. - Encoding mismatch: opening a UTF-8 CSV as ASCII crashes on non-English text.
Practice Problems
- Problem 1: Create a CSV file of 10 students with name, age, marks.
- Problem 2: Read the CSV and print the topper based on marks.
- Problem 3: Append a new student record to an existing CSV.
- Problem 4: Convert a list of dicts (each representing a student) into a CSV using DictWriter.
- Problem 5: Read a CSV and write a new CSV containing only students with marks above 80.
- Problem 6: Use pandas to load a CSV and plot a bar chart of marks by name.
Interview Questions
- Q1. Why prefer the
csvmodule over manual string splitting? - Q2. Difference between
csv.readerandcsv.DictReader? - Q3. What is the role of
newline=""inopen()? - Q4. How do you handle CSV files where fields contain commas or quotes?
- Q5. How would you process a 5 GB CSV file in Python efficiently?
- Q6. When would you choose pandas over the built-in csv module?
Frequently Asked Questions
- Q1: Why use the csv module instead of splitting on commas?
The csv module correctly handles quoted fields, escaped quotes, and embedded commas — manual splitting cannot. - Q2: Why do I see blank lines on Windows?
Open the file with newline=''. Without it, Windows adds extra carriage returns. - Q3: What is DictReader?
An iterator that returns each row as a dictionary keyed by the header row, making column access by name simple. - Q4: How do I read a CSV with a custom delimiter?
Pass delimiter='\t', delimiter=';', or any character via the delimiter argument to csv.reader/writer. - Q5: Is csv module faster than pandas?
For pure I/O, comparable. Pandas is faster for analytics because operations are vectorised in C. - Q6: How do I handle CSVs with millions of rows?
Stream row by row, use generators, or use pandas with chunksize=10000.
Summary
CSV is the universal language of tabular data. Python's built-in csv module handles the format flawlessly — and DictReader/DictWriter make your code self-documenting. For analytics workloads, switch to pandas and your CSV reading becomes part of a much larger toolkit. Either way, you can move data between Python and the rest of the world effortlessly.
Continue Learning
Previous
Go to Previous Chapter