Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am testing some well known models for computer vision: UNet, FC-DenseNet103, this implementation I train them with 224x224 randomly cropped patches and do the same on the validation set. Now when I run inference on some videos, I pass it the frames directly (1280x640) and it works. It runs the same operations on different image sizes and never gives an error. It actually gives a nice output, but the quality of the output depends on the image size... Now it's been a long time since I've worked with neural nets but when I was using tensorflow I remember I had to crop the input images to the train crop size.

Why don't I need to do this anymore? What's happening under the hood?

question from:https://stackoverflow.com/questions/65933454/torch-model-forward-with-a-diferent-image-size

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.2k views
Welcome To Ask or Share your Answers For Others

1 Answer

It seems that the models that you are using have no linear layers. Because of this the output of the convolutional layers go straight into the softmax function. The softmax function doesn't take a specific shape for its input so it can take any shape as input. Because of this your model will work with any shape of image but the accuracy of your model will probably be far worse given different image shapes than the one you trained on.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...