Your question got a little bit deeper, when you asked further. I'll try to cover all your options...
Reading One File: How many threads?
Use one thread.
If you read straight through a file front-to-back from a single thread, the operating system will not fetch the file in small chunks like you're thinking. Rather, it will prefetch the file ahead of you in huge (exponentially growing) chunks, so you almost never pay a penalty for going to disk. You might wait for the disk a handful of times, but in general it will be like the file was already in memory, and this is even irrespective of mmap
.
The OS is very good at this kind of sequential file reading, because it's predictable. When you read a file from multiple threads, you're essentially reading randomly, which is (obviously) less predictable. Prefetchers tend to be much less effective with random reads, in this case probably making the whole application slower instead of faster.
Notice: This is even before you add the cost of setting up the threads and all the rest of it. That costs something, too, but it's basically nothing compared with the cost of more blocking disk accesses.
Reading Multiple Files: How many threads?
Use as many threads as you have files (or some reasonable number).
File prefetching done separately for each open file. Once you start reading multiple files, you should read from several of them in parallel. This works because the disk I/O Scheduler will try to figure out the fastest order in which to read all of them in. Often, there's a disk scheduler both in the OS and on the hard drive itself. Meanwhile, the prefetcher can still do its job.
Reading several files in parallel is always better than reading the files one-by-one. If you did read them one at a time, your disk would idle between prefetches; that's valuable time to read more data into memory! The only way you can go wrong is if you have too little RAM to support many open files; that's not common, anymore.
A word of caution: If you're too overzealous with your multiple file reads, reading one file will start kicking bits of other files out of memory, and you're back to a random-read situation.
Combining n Files into One.
Processing and producing output from multiple threads might work, but it depends how you need to combine them. You'll have to be careful about how you synchronize the threads, in any case, though there are surely some relatively easy lock-less ways to do that.
One thing to look for, though: Don't bother writing the file in small (< 4K) blocks. Collect at least 4K of data at a time before you call write()
. Also, since the kernel will lock the file when you write it, don't call write()
from all of your threads together; they'll all wait for each other instead of processing more data.