The usage here is the same as Using read() directly into a C++ std:vector, but with an acount of reallocation.
The size of input file is unknown, thus the buffer is reallocated by doubling size when file size exceeds buffer size. Here's my code:
#include <vector>
#include <fstream>
#include <iostream>
int main()
{
const size_t initSize = 1;
std::vector<char> buf(initSize); // sizes buf to initSize, so &buf[0] below is valid
std::ifstream ifile("D:\Pictures\input.jpg", std::ios_base::in|std::ios_base::binary);
if (ifile)
{
size_t bufLen = 0;
for (buf.reserve(1024); !ifile.eof(); buf.reserve(buf.capacity() << 1))
{
std::cout << buf.capacity() << std::endl;
ifile.read(&buf[0] + bufLen, buf.capacity() - bufLen);
bufLen += ifile.gcount();
}
std::ofstream ofile("rebuild.jpg", std::ios_base::out|std::ios_base::binary);
if (ofile)
{
ofile.write(&buf[0], bufLen);
}
}
}
The program prints the vector capacity just as expected, and writes the output file just the same size as input, BUT, with only the same bytes as input before offset initSize
, and all zeros afterward...
Using &buf[bufLen]
in read()
is definitly an undefined behavior, but &buf[0] + bufLen
gets the right postition to write because continuous allocation is guaranteed, isn't it? (provided initSize != 0
. Note that std::vector<char> buf(initSize);
sizes buf
to initSize
. And yes, if initSize == 0
, a rumtime fatal error ocurrs in my environment.) Do I miss something? Is this also an UB? Does the standard say anything about this usage of std::vector?
Yes, I know we can calculate the file size first and allocate exactly the same buffer size, but in my project, it can be expected that the input files nearly ALWAYS be smaller than a certain SIZE
, so I can set initSize
to SIZE
and expect no overhead (like file size calculation), and use reallocation just for "exception handling". And yes, I know I can replace reserve()
with resize()
and capacity()
with size()
, then get things work with little overhead (zero the buffer in every resizing), but I still want to get rid of any redundent operation, just a kind of paranoid...
updated 1:
In fact, we can logically deduce from the standard that &buf[0] + bufLen
gets the right postition, consider:
std::vector<char> buf(128);
buf.reserve(512);
char* bufPtr0 = &buf[0], *bufPtrOutofRange = &buf[0] + 200;
buf.resize(256); std::cout << "standard guarantees no reallocation" << std::endl;
char* bufPtr1 = &buf[0], *bufInRange = &buf[200];
if (bufPtr0 == bufPtr1)
std::cout << "so bufPtr0 == bufPtr1" << std::endl;
std::cout << "and 200 < buf.size(), standard guarantees bufInRange == bufPtr1 + 200" << std::endl;
if (bufInRange == bufPtrOutofRange)
std::cout << "finally we have: bufInRange == bufPtrOutofRange" << std::endl;
output:
standard guarantees no reallocation
so bufPtr0 == bufPtr1
and 200 < buf.size(), standard guarantees bufInRange == bufPtr1 + 200
finally we have: bufInRange == bufPtrOutofRange
And here 200 can be replaced with every buf.size() <= i < buf.capacity()
and the similar deduction holds.
updated 2:
Yes, I did miss something... But the problem is not continuity (see update 1), and even not failure to write memory (see my answer). Today I got some time to look into the problem, the program got the right address, wrote the right data into reserved memory, but in the next reserve()
, buf
is reallocated and with ONLY the elements in range [0, buf.size())
copied to the new memory. So this's the answer to the whole riddle...
Final note: If you needn't reallocation after your buffer is filled with some data, you can definitely use reserve()/capatity()
instead of resize()/size()
, but if you need, use the latter. Also, under all implementations available here (VC++, g++, ICC), the example works as expected:
const size_t initSize = 1;
std::vector<char> buf(initSize);
buf.reserve(1024*100); // assume the reserved space is enough for file reading
std::ifstream ifile("D:\Pictures\input.jpg", std::ios_base::in|std::ios_base::binary);
if (ifile)
{
ifile.read(&buf[0], buf.capacity()); // ok. the whole file is read into buf
std::ofstream ofile("rebuld.jpg", std::ios_base::out|std::ios_base::binary);
if (ofile)
{
ofile.write(&buf[0], ifile.gcount()); // rebuld.jpg just identical to input.jpg
}
}
buf.reserve(1024*200); // horror! probably always lose all data in buf after offset initSize
And here's another example, quoted from 'TC++PL, 4e' pp 1041, note that the first line in the function uses reserve()
rather than resize()
:
void fill(istream& in, string& s, int max)
// use s as target for low-level input (simplified)
{
s.reserve(max); // make sure there is enough allocated space
in.read(&s[0],max);
const int n = in.gcount(); // number of characters read
s.resize(n);
s.shrink_to_fit(); // discard excess capacity
}
Update 3 (after 8 years): Many things happened during these years, I did not use C++ as my working language for nearly 6 years, and now I am a PhD student! Also, though many think there are UBs, the reasons they gave are quite different (and some were already shown to be not UBs), indicating this is a complex case. So, before casting votes and write answers, it is highly recommended to read and be involved in comments.
Another thing is that, with the PhD training, I can now dive into the C++ standard with relative ease, which I dared not years ago. I believe I showed in my own answer that, based on the standard, the above two code blocks should work. (The string
example requires C++11.) Since my answer is still contentious (but not falsified, I believe), I do not accept it, but rather am open to critical reviews and other answers.