Data Structures In Python — PBA Institute Tutorial
Chapter 24 · Python Programming Series
12 min read Beginner

A Tour Of Python Data Structures

Choosing the right data structure can make your code 10× faster or 10× clearer. Python ships with a remarkable variety — lists, tuples, sets, dictionaries, deques, heaps, named tuples, frozensets, and more. In this final chapter we tour each one, when to use it, and the performance trade-offs that matter.

Overview

📚

Built-In Variety

Lists, tuples, sets, dicts, deques, heaps, named tuples — all without imports… mostly.

Performance Aware

Each structure has clear Big-O guarantees.

🧱

Hashable Keys

Tuples and frozensets serve as composite dictionary keys.

🛠️

Specialised Tools

Counter, defaultdict, deque for common patterns.

📐

Composable

Mix and match — dict of lists, list of namedtuples, set of tuples.

Syntax

  • Sequences: list, tuple, range, str, bytes.
  • Mappings: dict, defaultdict, OrderedDict, Counter.
  • Sets: set, frozenset.
  • Specialised: deque, heapq, namedtuple, array.array, numpy.ndarray.
Data Structures — Syntax
# Quick overview
lst   = [1, 2, 3]                # mutable, ordered
tup   = (1, 2, 3)                # immutable, ordered
st    = {1, 2, 3}                # unique, unordered
dct   = {"a": 1, "b": 2}         # key→value

from collections import deque, defaultdict, Counter, namedtuple
import heapq

dq    = deque([1, 2, 3])         # O(1) at both ends
dd    = defaultdict(list)        # auto-creates default
cnt   = Counter("banana")        # frequencies
Pt    = namedtuple("Pt", "x y")
heap  = []; heapq.heappush(heap, 5)

Detailed Explanation

  • list: Ordered, mutable, allows duplicates. Random access in O(1), append in amortised O(1). Best general-purpose sequence.
  • tuple: Ordered, immutable, hashable. Slightly faster and safer than lists. Use for fixed-shape records and dict keys.
  • set / frozenset: Unordered collections of unique items. Membership tests in O(1) average. frozenset is the immutable variant — usable as dict key.
  • dict: Hash table mapping keys to values. O(1) lookup. Backbone of JSON, namespaces, and many caches.
  • collections.defaultdict / Counter / OrderedDict: Specialised dicts: defaultdict auto-creates missing entries; Counter counts hashables; OrderedDict tracks insertion order (largely redundant after 3.7).
  • deque: Double-ended queue with O(1) appends and pops on either side. Perfect for queues and circular buffers.
  • heapq: A binary min-heap implemented over a list. Useful for priority queues — O(log n) push/pop.
  • namedtuple: A tuple subclass with named fields — readable, lightweight, immutable.

Code Examples

Example 1 — Comparing Structures
lst, tup, st, dct = [1,2,3], (1,2,3), {1,2,3}, {"a":1,"b":2}
print(type(lst).__name__, lst)
print(type(tup).__name__, tup)
print(type(st).__name__,  st)
print(type(dct).__name__, dct)
Output list [1, 2, 3]
tuple (1, 2, 3)
set {1, 2, 3}
dict {'a': 1, 'b': 2}
Example 2 — Set Operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print("union     :", a | b)
print("intersect :", a & b)
print("diff      :", a - b)
Output union : {1, 2, 3, 4, 5, 6}
intersect : {3, 4}
diff : {1, 2}
Example 3 — defaultdict
from collections import defaultdict
groups = defaultdict(list)
for word in ["apple", "ant", "bear", "bat", "cat"]:
    groups[word[0]].append(word)
print(dict(groups))
Output {'a': ['apple', 'ant'], 'b': ['bear', 'bat'], 'c': ['cat']}
Example 4 — Counter
from collections import Counter
c = Counter("mississippi")
print(c)
print(c.most_common(2))
Output Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
[('i', 4), ('s', 4)]
Example 5 — heapq Priority Queue
import heapq
h = []
for x in [5, 2, 8, 1, 7]:
    heapq.heappush(h, x)
ordered = [heapq.heappop(h) for _ in range(len(h))]
print(ordered)
Output [1, 2, 5, 7, 8]
Example 6 — namedtuple
from collections import namedtuple
Point = namedtuple("Point", "x y")
p = Point(3, 4)
print(p)
print(p.x + p.y)
Output Point(x=3, y=4)
7

Real-World Use Cases

Indexes

Dictionaries power in-memory indexes for fast lookup.

User Sets

Find common interests with set intersections.

Top-K Problems

heapq.nlargest finds the top-K items efficiently.

LRU Caches

OrderedDict or functools.lru_cache implement caches.

Frequency Analysis

Counter turns log/word/event analysis into a one-liner.

Pathfinding

Heaps + dicts implement Dijkstra and A* algorithms.

Notes & Pro Tips

  • Choose by access pattern, not familiarity — the right structure can change runtime from minutes to milliseconds.
  • Use sets for unique-membership tests; dicts for key-value lookups; lists for ordered collections.
  • collections has free upgrades: defaultdict, Counter, deque.
  • Heaps are min-heaps in Python — for a max-heap, push negated values.
  • frozenset makes a set hashable — usable as a dict key.
  • When in doubt, profile with timeit and choose data.

Common Mistakes

  • Using list for membership tests: x in list is O(n); use a set for O(1).
  • Mutable in a set: lists/dicts can't be set members — use tuples or frozensets.
  • Confusing dict and Counter: Counter returns 0 for missing keys; dict raises KeyError.
  • Wrong heap direction: heapq is a min-heap; negate values for max-heap behaviour.
  • Iterating set order: sets are unordered — don't rely on iteration order.
  • Not using namedtuple: magic indices like row[3] hide intent.

Practice Problems

  • Problem 1: Find the most common element in a list using Counter.
  • Problem 2: Group a list of words by length using defaultdict(list).
  • Problem 3: Implement an LRU cache of capacity 3 using OrderedDict.
  • Problem 4: Use heapq to find the top 3 largest numbers in a list of 1000 random integers.
  • Problem 5: Given two lists, find common elements without using loops (use set intersection).
  • Problem 6: Use namedtuple to represent a 3D point and compute distance between two points.

Interview Questions

  • Q1. Compare list, tuple, set, and dict in terms of mutability and ordering.
  • Q2. What is the time complexity of common operations on each structure?
  • Q3. When would you use a deque over a list?
  • Q4. What does Counter.most_common(k) return?
  • Q5. Explain how heapq implements a priority queue.
  • Q6. Why use a frozenset?

Frequently Asked Questions

  • Q1: Which is faster — list or set — for membership tests?
    Set. 'x in set' is average O(1); 'x in list' is O(n).
  • Q2: When should I use a tuple over a list?
    When the data is fixed and you want immutability, hashability, or slightly better performance.
  • Q3: What is the difference between defaultdict and dict?
    defaultdict auto-creates a default value (e.g. an empty list) when a missing key is accessed, while dict raises KeyError.
  • Q4: What does Counter do?
    It counts occurrences of hashable items and supports .most_common(k) plus arithmetic between counters.
  • Q5: Is OrderedDict still needed?
    Rarely — since Python 3.7 regular dicts preserve insertion order. OrderedDict still offers move_to_end and equality based on order.
  • Q6: How do I implement a max-heap?
    Push negated values: heapq.heappush(h, -x). Negate again on pop.

Summary

Picking the right data structure is one of the highest-leverage decisions you make as a developer. Python's built-ins — list, tuple, set, dict — cover most cases, and the collections and heapq modules add power-ups for common patterns. Match the structure to your access pattern, measure when in doubt, and your code will be both elegant and efficient. Congratulations — you've completed the PBA Institute Python tutorial series!

Continue Learning

Previous

Go to Previous Chapter