Classification with pretrained pytorch vgg16 model and its classes

大兔子大兔子 提交于 2021-02-11 15:54:33

问题


I wrote a image vgg classification model with pytorch's pretrained vgg16 model.

import matplotlib.pyplot as plt
import numpy as np
import torch
from PIL import Image
import urllib
from skimage.transform import resize
from skimage import io
import yaml

# Downloading imagenet 1000 classes list
file = urllib. request. urlopen("https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt")
classes = ''
for f in file:
  classes = classes +  f.decode("utf-8")
classes = yaml.load(classes)

# Downloading pretrained vgg16 model
model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg16', pretrained=True)

print(model)

for param in model.parameters():
    param.requires_grad = False


url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/dog.jpg", "dog.jpg")

image=io.imread(url)

plt.imshow(image)
plt.show()

# resize to 224x224x3
img = resize(image,(224,224,3))

plt.imshow(img)
plt.show()
# Normalizing input for vgg16
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
img1 = mean*img+std
img1 = np.clip(img1,0,1)

img1 = torch.from_numpy(img1).unsqueeze(0)
img1 = img1.permute(0,3,2,1) # batch_size x channels x height x width

model.eval()
pred = model(img1.float())
print(classes[torch.argmax(pred).numpy().tolist()])

The code works fine but its outputting wrong classes. I am not sure where I did wrong but If I have to guess it might be the imagenet yaml classes list or at the normalizing input image. Can anyone tell me where I am making the mistakes?


回答1:


There are some issues with the image preprocessing. Firstly, the normalisation is calculated as (value - mean) / std), not value * mean + std. Secondly, the values should not be clipped to [0, 1], the normalisation purposely shifts the values away from [0, 1]. Secondly, the image as NumPy array has shape [height, width, 3], when you permute the dimensions you swap the height and width dimension, creating a tensor with shape [batch_size, channels, width, height].

img = resize(image,(224,224,3))


# Normalizing input for vgg16
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
img1 = (img1 - mean) / std

img1 = torch.from_numpy(img1).unsqueeze(0)
img1 = img1.permute(0, 3, 1, 2) # batch_size x channels x height x width

Instead of doing that manually, you can use torchvision.transforms.

from torchvision import transforms

preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

img = resize(image,(224,224,3))
img1 = preprocess(img)
img1 = img1.unsqueeze(0)

If you use PIL to load the images, you could also resize the images by adding transforms.Resize((224, 224)) to the preprocessing pipeline, or you could even add transforms.ToPILImage() to first convert the image to a PIL image (transforms.Resize requires a PIL image).



来源:https://stackoverflow.com/questions/62482336/classification-with-pretrained-pytorch-vgg16-model-and-its-classes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!