Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm working on a VQA model, and I need some help as I'm new to this.

I want to use transfer learning from the VGG19 network before running the train, so when I start the train, I will have the image features ahead (trying to solve performance issue).

Does it possible to do so? If so, can someone please share an example with pytorch?

below is the relevant code:

class img_CNN(nn.Module):
  def __init__(self, img_size):

        super(img_CNN, self).__init__()
        self.model = models.vgg19(pretrained=True)
        self.in_features = self.model.classifier[-1].in_features
        self.model.classifier = nn.Sequential(*list(self.model.classifier.children())[:-1]) # remove vgg19 last layer
        self.fc = nn.Linear(in_features, img_size)

  def forward(self, image):
    #with torch.no_grad():
    img_feature = self.model(image) # (batch, channel, height, width)
    img_feature = self.fc(img_feature)   
    return img_feature

class vqamodel(nn.Module):
  def __init__(self, output_dim,input_dim, emb_dim, hid_dim, n_layers, dropout, answer_len, que_size, img_size,model_vgg,in_features):
    super(vqamodel,self).__init__()
    self.image=img_CNN(img_size)
    self.question=question_lstm(input_dim, emb_dim, hid_dim, n_layers, dropout,output_dim,que_size)
    self.tanh=nn.Tanh()
    self.relu=nn.ReLU()
    self.dropout=nn.Dropout(dropout)
    self.fc1=nn.Linear(que_size,answer_len) #the input to the linear network is equal to the combain vector
    self.softmax=nn.Softmax(dim=1)


  def forward(self, image, question):
    image_emb=self.image(image)
    question_emb=self.question(question) 
    combine =question_emb*image_emb
    out_feature=self.fc1(combine)
    out_feature=self.relu(out_feature)
      
    return (out_feature)

How can I take out the models.vgg19(pretrained=True),run it before the train on the image dataloader and save the image representation in NumPy array?

thank you!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
4.6k views
Welcome To Ask or Share your Answers For Others

1 Answer

Yes, you can use a pretrained VGG model to extract embedding vectors from images. Here is a possible implementation, using torchvision.models.vgg*.

  1. First retrieve the pretrained model

    model = torchvision.models.vgg19(pretrained=True)
    

    Its classifier is:

    >>> model.classifier
    (classifier): Sequential(
        (0): Linear(in_features=25088, out_features=4096, bias=True)
        (1): ReLU(inplace=True)
        (2): Dropout(p=0.5, inplace=False)
        (3): Linear(in_features=4096, out_features=4096, bias=True)
        (4): ReLU(inplace=True)
        (5): Dropout(p=0.5, inplace=False)
        (6): Linear(in_features=4096, out_features=1000, bias=True)
    )
    
  2. Depending on your finetuning strategy, you can either truncate it to keep some of the trained dense layers:

    model.classifier = nn.Sequential(*[model.classifier[i] for i in range(4)])
    

    Or replace it altogether with a different set of dense layers wrapped in a nn.Sequential:

    model.classifier = nn.Sequential(
        nn.Linear(25088, 4096),
        nn.ReLU(True),
        nn.Dropout(0.5),
        nn.Linear(4096, 2048))
    
  3. Additionally, you can freeze the entire head of the model (the feature extractor):

    for param in model.features.parameters():
        param.requires_grad = False
    
  4. Then you will be able to use that model to extract image embeddings and perform back propagation to finetune your classifier:

    >>> model(img) # shape (batchs_size, 2048) 
    

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...