Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I trained a Sklearn RandomForestRegressor model on 19GB of training data. I would like to save it to disk in order to use it later for inference. As have been recomended in another stackoverflow questions, I tried the following:

  • Pickle
pickle.dump(model, open(filename, 'wb'))

Model was saved successfully. It's size on disk was 1.9 GB.

loaded_model = pickle.load(open(filename, 'rb'))

Loading of the model resulted in MemorError (despite 16 GB RAM)

  • cPickle - the same result as Pickle
  • Joblib

joblib.dump(est, 'random_forest.joblib' compress=3)

It also ends with the MemoryError while loading the file.

  • Klepto
d = klepto.archives.dir_archive('sklearn_models', cached=True, serialized=True)
d['sklearn_random_forest'] = est
d.dump()

Arhcive is created, but when I want to load it using the following code, I get the KeyError: 'sklearn_random_forest'

d = klepto.archives.dir_archive('sklearn_models', cached=True, serialized=True)
d.load(model_params)
est = d[model_params]

I tried saving dictionary object using the same code, and it worked, so the code is correct. Apparently Klepto cannot persist sklearn models. I played with cached and serialized parameters and it didn't help.

Any hints on how to handle this would be very appreciated. Is it possible to save the model in JSON, XML, maybe HDFS, or maybe other formats?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.3k views
Welcome To Ask or Share your Answers For Others

1 Answer

Try using joblib.dump()

In this method, you can use the param "compress". This param takes in Integer values between 0 and 9, the higher the value the more compressed your file gets. Ideally, a compress value of 3 would suffice.

The only downside is that the higher the compress value slower the write/read speed!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...