I'm trying to save bottleneck values to a newly created hdf5 file.
The bottleneck values come in batches of shape (120,10,10, 2048)
.
Saving one alone batch is taking up more than 16 gigs and python seems to be freezing at that one batch. Based on recent findings (see update, it seems hdf5 taking up large memory is okay, but the freezing part seems to be a glitch.
I'm only trying to save the first 2 batches for test purposes and only the training data set (once again,this is a test run), but I can't even get past the first batch. It just stalls at the first batch and doesn't loop to the next iteration. If I try to check the hdf5, explorer will get sluggish, and Python will freeze. If I try to kill Python (even with out checking hdf5 file), Python doesn't close properly and it forces a restart.
Here is the relevant code and data:
Total data points are about 90,000 ish, released in batches of 120.
Bottleneck shape is (120,10,10,2048)
So the first batch I'm trying to save is (120,10,10,2048)
Here is how I tried to save the dataset:
with h5py.File(hdf5_path, mode='w') as hdf5:
hdf5.create_dataset("train_bottle", train_shape, np.float32)
hdf5.create_dataset("train_labels", (len(train.filenames), params['bottle_labels']),np.uint8)
hdf5.create_dataset("validation_bottle", validation_shape, np.float32)
hdf5.create_dataset("validation_labels",
(len(valid.filenames),params['bottle_labels']),np.uint8)
#this first part above works fine
current_iteration = 0
print('created_datasets')
for x, y in train:
number_of_examples = len(train.filenames) # number of images
prediction = model.predict(x)
labels = y
print(prediction.shape) # (120,10,10,2048)
print(y.shape) # (120, 12)
print('start',current_iteration*params['batch_size']) # 0
print('end',(current_iteration+1) * params['batch_size']) # 120
hdf5["train_bottle"][current_iteration*params['batch_size']: (current_iteration+1) * params['batch_size'],...] = prediction
hdf5["train_labels"][current_iteration*params['batch_size']: (current_iteration+1) * params['batch_size'],...] = labels
current_iteration += 1
print(current_iteration)
if current_iteration == 3:
break
This is the output of the print statements:
(90827, 10, 10, 2048) # print(train_shape)
(6831, 10, 10, 2048) # print(validation_shape)
created_datasets
(120, 10, 10, 2048) # print(prediction.shape)
(120, 12) #label.shape
start 0 #start of batch
end 120 #end of batch
# Just stalls here instead of printing `print(current_iteration)`
It just stalls here for while (20 mins +), and the hdf5 file slowly grows in size (around 20 gigs now, before I force kill). Actually I can't even force kill with task manager, I have to restart the OS, to actually kill Python in this case.
Update
After playing around with my code for a bit, there seems to be a strange bug/behavior.
The relevant part is here:
hdf5["train_bottle"][current_iteration*params['batch_size']: (current_iteration+1) * params['batch_size'],...] = prediction
hdf5["train_labels"][current_iteration*params['batch_size']: (current_iteration+1) * params['batch_size'],...] = labels
If I run either of these lines, my script will go through the iterations, and automatically break as expected. So there is no freeze if I run either-or. It happens fairly quickly as well -- less than one min.
If I run the first line ('train_bottle')
, my memory is taking up about 69-72 gigs, even if it's only a couple of batches. If I try more batches, the memory is the same. So I'm assuming the train_bottle
decided storage based on the size parameters I'm assigning the dataset, and not actually when it gets filled.
So despite the 72 gigs, it's running fairly quickly (one min).
If I run the second line, train_labels
, my memory takes up a few megabytes.
There is no problem with the iterations, and break statement is executed.
However, now here is the problem, If I try to run both lines (which in my case is necessary as I need to save both 'train_bottle' and 'train_labels'), I'm experiencing a freeze on the first iteration, and it doesn't continue to the second iteration, even after 20 mins. The Hdf5 file is slowly growing, but if I try to access it, Windows Explorer slows down to a snail and I can't close Python -- I have to restart the OS.
So I'm not sure what the problem is when trying to running both lines -- as if I run the memory hungry train_data
line, if works perfectly and ends within a min.