A Tour Of Python Data Structures
Choosing the right data structure can make your code 10× faster or 10× clearer. Python ships with a remarkable variety — lists, tuples, sets, dictionaries, deques, heaps, named tuples, frozensets, and more. In this final chapter we tour each one, when to use it, and the performance trade-offs that matter.
Overview
Built-In Variety
Lists, tuples, sets, dicts, deques, heaps, named tuples — all without imports… mostly.
Performance Aware
Each structure has clear Big-O guarantees.
Hashable Keys
Tuples and frozensets serve as composite dictionary keys.
Specialised Tools
Counter, defaultdict, deque for common patterns.
Composable
Mix and match — dict of lists, list of namedtuples, set of tuples.
Syntax
- Sequences:
list,tuple,range,str,bytes. - Mappings:
dict,defaultdict,OrderedDict,Counter. - Sets:
set,frozenset. - Specialised:
deque,heapq,namedtuple,array.array,numpy.ndarray.
# Quick overview
lst = [1, 2, 3] # mutable, ordered
tup = (1, 2, 3) # immutable, ordered
st = {1, 2, 3} # unique, unordered
dct = {"a": 1, "b": 2} # key→value
from collections import deque, defaultdict, Counter, namedtuple
import heapq
dq = deque([1, 2, 3]) # O(1) at both ends
dd = defaultdict(list) # auto-creates default
cnt = Counter("banana") # frequencies
Pt = namedtuple("Pt", "x y")
heap = []; heapq.heappush(heap, 5)
Detailed Explanation
- list: Ordered, mutable, allows duplicates. Random access in O(1), append in amortised O(1). Best general-purpose sequence.
- tuple: Ordered, immutable, hashable. Slightly faster and safer than lists. Use for fixed-shape records and dict keys.
- set / frozenset: Unordered collections of unique items. Membership tests in O(1) average.
frozensetis the immutable variant — usable as dict key. - dict: Hash table mapping keys to values. O(1) lookup. Backbone of JSON, namespaces, and many caches.
- collections.defaultdict / Counter / OrderedDict: Specialised dicts:
defaultdictauto-creates missing entries;Countercounts hashables;OrderedDicttracks insertion order (largely redundant after 3.7). - deque: Double-ended queue with O(1) appends and pops on either side. Perfect for queues and circular buffers.
- heapq: A binary min-heap implemented over a list. Useful for priority queues — O(log n) push/pop.
- namedtuple: A tuple subclass with named fields — readable, lightweight, immutable.
Code Examples
lst, tup, st, dct = [1,2,3], (1,2,3), {1,2,3}, {"a":1,"b":2}
print(type(lst).__name__, lst)
print(type(tup).__name__, tup)
print(type(st).__name__, st)
print(type(dct).__name__, dct)
tuple (1, 2, 3)
set {1, 2, 3}
dict {'a': 1, 'b': 2}
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print("union :", a | b)
print("intersect :", a & b)
print("diff :", a - b)
intersect : {3, 4}
diff : {1, 2}
from collections import defaultdict
groups = defaultdict(list)
for word in ["apple", "ant", "bear", "bat", "cat"]:
groups[word[0]].append(word)
print(dict(groups))
from collections import Counter
c = Counter("mississippi")
print(c)
print(c.most_common(2))
[('i', 4), ('s', 4)]
import heapq
h = []
for x in [5, 2, 8, 1, 7]:
heapq.heappush(h, x)
ordered = [heapq.heappop(h) for _ in range(len(h))]
print(ordered)
from collections import namedtuple
Point = namedtuple("Point", "x y")
p = Point(3, 4)
print(p)
print(p.x + p.y)
7
Real-World Use Cases
Indexes
Dictionaries power in-memory indexes for fast lookup.
User Sets
Find common interests with set intersections.
Top-K Problems
heapq.nlargest finds the top-K items efficiently.
LRU Caches
OrderedDict or functools.lru_cache implement caches.
Frequency Analysis
Counter turns log/word/event analysis into a one-liner.
Pathfinding
Heaps + dicts implement Dijkstra and A* algorithms.
Notes & Pro Tips
- Choose by access pattern, not familiarity — the right structure can change runtime from minutes to milliseconds.
- Use sets for unique-membership tests; dicts for key-value lookups; lists for ordered collections.
collectionshas free upgrades:defaultdict,Counter,deque.- Heaps are min-heaps in Python — for a max-heap, push negated values.
frozensetmakes a set hashable — usable as a dict key.- When in doubt, profile with
timeitand choose data.
Common Mistakes
- Using list for membership tests:
x in listis O(n); use a set for O(1). - Mutable in a set: lists/dicts can't be set members — use tuples or frozensets.
- Confusing dict and Counter: Counter returns 0 for missing keys; dict raises KeyError.
- Wrong heap direction: heapq is a min-heap; negate values for max-heap behaviour.
- Iterating set order: sets are unordered — don't rely on iteration order.
- Not using
namedtuple: magic indices likerow[3]hide intent.
Practice Problems
- Problem 1: Find the most common element in a list using
Counter. - Problem 2: Group a list of words by length using
defaultdict(list). - Problem 3: Implement an LRU cache of capacity 3 using
OrderedDict. - Problem 4: Use
heapqto find the top 3 largest numbers in a list of 1000 random integers. - Problem 5: Given two lists, find common elements without using loops (use set intersection).
- Problem 6: Use
namedtupleto represent a 3D point and compute distance between two points.
Interview Questions
- Q1. Compare list, tuple, set, and dict in terms of mutability and ordering.
- Q2. What is the time complexity of common operations on each structure?
- Q3. When would you use a
dequeover a list? - Q4. What does
Counter.most_common(k)return? - Q5. Explain how
heapqimplements a priority queue. - Q6. Why use a
frozenset?
Frequently Asked Questions
- Q1: Which is faster — list or set — for membership tests?
Set. 'x in set' is average O(1); 'x in list' is O(n). - Q2: When should I use a tuple over a list?
When the data is fixed and you want immutability, hashability, or slightly better performance. - Q3: What is the difference between defaultdict and dict?
defaultdict auto-creates a default value (e.g. an empty list) when a missing key is accessed, while dict raises KeyError. - Q4: What does Counter do?
It counts occurrences of hashable items and supports .most_common(k) plus arithmetic between counters. - Q5: Is OrderedDict still needed?
Rarely — since Python 3.7 regular dicts preserve insertion order. OrderedDict still offers move_to_end and equality based on order. - Q6: How do I implement a max-heap?
Push negated values: heapq.heappush(h, -x). Negate again on pop.
Summary
Picking the right data structure is one of the highest-leverage decisions you make as a developer. Python's built-ins — list, tuple, set, dict — cover most cases, and the collections and heapq modules add power-ups for common patterns. Match the structure to your access pattern, measure when in doubt, and your code will be both elegant and efficient. Congratulations — you've completed the PBA Institute Python tutorial series!
Continue Learning
Previous
Go to Previous Chapter