I am testing some well known models for computer vision: UNet, FC-DenseNet103, this implementation I train them with 224x224 randomly cropped patches and do the same on the validation set. Now when I run inference on some videos, I pass it the frames directly (1280x640) and it works. It runs the same operations on different image sizes and never gives an error. It actually gives a nice output, but the quality of the output depends on the image size... Now it's been a long time since I've worked with neural nets but when I was using tensorflow I remember I had to crop the input images to the train crop size.
Why don't I need to do this anymore? What's happening under the hood?
question from:https://stackoverflow.com/questions/65933454/torch-model-forward-with-a-diferent-image-size