Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have two processes, one of which is writing (appending) to a file, the other is reading from it. Both processes are running concurrently, but do not communicate. Another reader process may start before the writer process has finished.

This approach works, but read() often returns having read zero bytes with no error. They ratio of zero length reads to non-zero length reads is high, which is inefficient.

Is there any way around this? This is on POSIX filesystems.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
480 views
Welcome To Ask or Share your Answers For Others

1 Answer

Without a communication channel, there's no guaranteed method to prevent zero-byte reads or even long periods of hanging without reading any data when reading a file that is actively being written. The Linux implementation of tail uses inotify to effectively create a communication channel and obtain information about the file write activity.

It's an interesting enough problem to that IBM has even published a Redbook describing an implementation that was able to do such "read-behind-write" at about 15 GB/sec:

Read-behind-write is a technique used by some high-end customers to lower latency and improve performance. The read-behind-write technique means that once the writer starts to write, the reader will immediately trail behind to read; the idea is to overlap the write time with read time. This concept is beneficial on machines with slow I/O performance. For a high I/O throughput machine such as pSeries 690, it may be worth considering first writing the entire file out in parallel and then reading the data back in parallel.

There are many ways that read-behind-write can be implemented. In the scheme implemented by Xdd, after the writer writes one record, it will wait for the reader to read that record before the writer can proceed. Although this scheme keeps the writer and reader in sync just one record apart, it takes system time to do the locking and synchronization between writer and reader.

If one does not care about how many records that a reader lags behind the writer, then one can implement a scheme for the writer to stream down the writes as fast as possible. The writer can update a global variable after a certain number of records are written. The reader can then pull the global variable to find out how many records it has to read.

Without a communications channel, you're pretty much left having to keep trying, perhaps calling sleep() or something similar after a number of zero-byte read() results.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...