问题
I took this convolutional neural network (CNN) from here. It accepts 32 x 32 images and defaults to 10 classes. However, I have 64 x 64 images with 500 classes. When I pass in 64 x 64 images (batch size held constant at 32), I get the following error.
ValueError: Expected input batch_size (128) to match target batch_size (32).
The stack trace starts at the line loss = loss_fn(outputs, labels). The outputs.shape is [128, 500] and the labels.shape is [32].
The code is listed here for completeness.
class Unit(nn.Module):
def __init__(self,in_channels,out_channels):
super(Unit,self).__init__()
self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
self.bn = nn.BatchNorm2d(num_features=out_channels)
self.relu = nn.ReLU()
def forward(self,input):
output = self.conv(input)
output = self.bn(output)
output = self.relu(output)
return output
class SimpleNet(nn.Module):
def __init__(self,num_classes=10):
super(SimpleNet,self).__init__()
self.unit1 = Unit(in_channels=3,out_channels=32)
self.unit2 = Unit(in_channels=32, out_channels=32)
self.unit3 = Unit(in_channels=32, out_channels=32)
self.pool1 = nn.MaxPool2d(kernel_size=2)
self.unit4 = Unit(in_channels=32, out_channels=64)
self.unit5 = Unit(in_channels=64, out_channels=64)
self.unit6 = Unit(in_channels=64, out_channels=64)
self.unit7 = Unit(in_channels=64, out_channels=64)
self.pool2 = nn.MaxPool2d(kernel_size=2)
self.unit8 = Unit(in_channels=64, out_channels=128)
self.unit9 = Unit(in_channels=128, out_channels=128)
self.unit10 = Unit(in_channels=128, out_channels=128)
self.unit11 = Unit(in_channels=128, out_channels=128)
self.pool3 = nn.MaxPool2d(kernel_size=2)
self.unit12 = Unit(in_channels=128, out_channels=128)
self.unit13 = Unit(in_channels=128, out_channels=128)
self.unit14 = Unit(in_channels=128, out_channels=128)
self.avgpool = nn.AvgPool2d(kernel_size=4)
self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
self.unit12, self.unit13, self.unit14, self.avgpool)
self.fc = nn.Linear(in_features=128,out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = output.view(-1,128)
output = self.fc(output)
return output
Any ideas on how to modify this CNN to accept and properly return outputs?
回答1:
The problem is an incompatible reshape (view) at the end.
You're using a sort of "flattening" at the end, which is different from a "global pooling". Both are valid for CNNs, but only the global poolings are compatible with any image size.
The flattened net (your case)
In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end.
So:
- Enter with 64x64
- Pool1 to 32x32
- Pool2 to 16x16
- Pool3 to 8x8
- AvgPool to 2x2
Then, at the end you've got a shape of (batch, 128, 2, 2). Four times the final number if the image were 32x32.
Then, your final reshape should be output = output.view(-1,128*2*2).
This is a different net with a different classification layer, though, because in_features=512.
The global pooling net
On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling:
def flatChannels(x):
size = x.size()
return x.view(size[0],size[1],size[2]*size[3])
def globalAvgPool2D(x):
return flatChannels(x).mean(dim=-1)
def globalMaxPool2D(x):
return flatChannels(x).max(dim=-1)
The ending of the model:
#removed the pool from here to put it in forward
self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4,
self.unit5, self.unit6, self.unit7, self.pool2, self.unit8,
self.unit9, self.unit10, self.unit11, self.pool3,
self.unit12, self.unit13, self.unit14)
self.fc = nn.Linear(in_features=128,out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = globalAvgPool2D(output) #or globalMaxPool2D
output = self.fc(output)
return output
回答2:
You need to use transforms module before trainig neural network (here is the link https://pytorch.org/docs/stable/torchvision/transforms.html).
You have a few options:
transforms.Resize(32),
transforms.ResizedCrop(32) - most preferable, because you can augment your data and prevent overfitting in some respect via this way.
transforms.CenterCrop(32), etc.
Moreover, you could compose transforms objects into one object via transforms.Compose).
Enjoy.
PS. Of course, you can refactor your Neural Network architecture, enabling it to take images of size 64 x 64.
来源:https://stackoverflow.com/questions/53875372/how-do-i-modify-this-pytorch-convolutional-neural-network-to-accept-a-64-x-64-im