I am currently working a lot with Google Colab, and wanted to time the execution time of a small MLP for inference. However, when executing the below code in Google Colab, the reported runtime decreases with the number of executions of the notebook (i.e., repeatedly pressing the play-button after the notebook terminates).
I am getting values around
1st execution: 0.03 s
2nd execution: 0.005 s
3rd execution: 0.0007 s
tested on different machines and with different browsers. Note that I'm aware that time.time()
has a precision limit of 1ms
on Unix systems, however, this does not explain the behaviour.
Is there some sort of caching going on on the GPU / in PyTorch? If so, why, and can I expect such a speed increase in the final, deployed application as well?
Code to replicate the behaviour:
import time
import torch
import torch.nn.functional as F
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = torch.nn.Linear(in_features=3, out_features=512)
self.fc2 = torch.nn.Linear(in_features=512, out_features=512)
self.fc3 = torch.nn.Linear(in_features=512, out_features=512)
self.fc4 = torch.nn.Linear(in_features=512, out_features=512)
self.fc5 = torch.nn.Linear(in_features=512, out_features=512)
self.fc6 = torch.nn.Linear(in_features=512, out_features=512)
self.fc7 = torch.nn.Linear(in_features=512, out_features=1)
def forward(self, x):
in_dim = x.shape[0]
x = F.relu(self.fc1(x.reshape(in_dim**2, 3)))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
x = F.relu(self.fc6(x))
x = F.relu(self.fc7(x)).reshape((in_dim, in_dim, 1))
return x
device = 'cuda'
mlp = Model().to(device)
dim = 1024
input_tensor = torch.rand((dim, dim, 3), device=device)
with torch.no_grad():
start_time = time.time()
out = mlp(input_tensor)
end_time = time.time()
print("Model FW pass {}p: {} seconds".format(input_tensor.shape[0],
end_time-start_time))
question from:https://stackoverflow.com/questions/66049240/google-colab-mlp-execution-time-not-constant-but-decreasing