mmap
How it works:
mmap
(memory map) lets you map a file’s bytes directly into memory, creating a view on the file as if it were a byte array.read()
or string allocations for each line are needed.Why it’s used here:
mm[start:end]
) without extra memory overhead.fork
How it works:
multiprocessing
defaults to using fork
, which means the child processes inherit memory from the parent.mmap
object in mm
is directly accessible to all child processes without duplication.Why it’s used here:
The big advantage is that the large memory-mapped file is not copied.
All processes can read from the same mmap
without extra RAM.
This is why the script explicitly does:
mp.set_start_method('fork', force=True)
to ensure fork
(important on macOS or future-proofing on systems that might default to spawn
).
multiprocessing