Official document TensorFlow SavedModel Warmup says:
The TensorFlow runtime has components that are lazily initialized, which can cause high latency for the first request/s sent to a model after it is loaded. This latency can be several orders of magnitude higher than that of a single inference request.
In my opinion, since a prediction process could warmups a model, components that are lazily initialized
couldn't be the init_op
of the graph, because init_op
only depends on parameters saved in SavedModel, and TFS will call the restore_op
to do the initializations.
If I'm right with this, then what is the components that are lazily initialized
?