I have a recurring task of splitting a set of large (about 1-2 GiB each) gzipped Apache logfiles into several parts (say chunks of 500K lines). The final files should be gzipped again to limit the disk usage.
On Linux I would normally do:
zcat biglogfile.gz | split -l500000
The resulting files files will be named xaa, xab, xac, etc So I do:
gzip x*
The effect of this method is that as an intermediate result these huge files are temporarily stored on disk. Is there a way to avoid this intermediate disk usage?
Can I (in a way similar to what xargs does) have split pipe the output through a command (like gzip) and recompress the output on the fly? Or am I looking in the wrong direction and is there a much better way to do this?
Thanks.
See Question&Answers more detail:os