basic.py
)We started with a straightforward single-threaded Python script:
File I/O:
Used plain open()
line by line with UTF-8 decoding.
Parsing:
Used float()
for each temperature measurement, multiplied by 10 for fixed-point.
Aggregation:
A standard Python dictionary with get()
and __setitem__
to maintain [count, sum, min, max]
.
Garbage collection disabled to avoid pauses.
Performance:
~10.5 seconds on 10 million lines.
This was expected: most time spent was in
prime.py
)We tackled the biggest bottleneck first — file reading.
Used Python’s mmap
to memory-map the entire file.
Now file contents live in virtual memory, and no Python strings are created per line. The OS handles paging, and seeking or slicing costs almost nothing.