TextIO.read()
and AvroIO.read()
(as well as some other Beam IO's) by default don't perform very well in current Apache Beam runners when reading a filepattern that expands into a very large number of files - for example, 1M files.
How can I read such a large number of files efficiently?
See Question&Answers more detail:os