I have a Linux application that reads 150-200 files (4-10GB) in parallel. Each file is read in turn in small, variably sized blocks, typically less than 2K each.
I currently need to maintain over 200 MB/s read rate combined from the set of files. The disks handle this just fine. There is a projected requirement of over 1 GB/s (which is out of the disk's reach at the moment).
We have implemented two different read systems both make heavy use of posix_advise
: first is a mmap
ed read in which we map the entirety of the data set and read on demand.
The second is a read()
/seek()
based system.
Both work well but only for the moderate cases, the read()
method manages our overall file cache much better and can deal well with 100s of GB of files, but is badly rate limited, mmap
is able to pre-cache data making the sustained data rate of over 200MB/s easy to maintain, but cannot deal with large total data set sizes.
So my question comes to these:
A: Can read()
type file i/o be further optimized beyond the posix_advise
calls on Linux, or having tuned the disk scheduler, VMM and posix_advise calls is that as good as we can expect?
B: Are there systematic ways for mmap to better deal with very large mapped data?
Mmap-vs-reading-blocks is a similar problem to what I am working and provided a good starting point on this problem, along with the discussions in mmap-vs-read.
See Question&Answers more detail:os