问题
I'm attempting to get this PyTorch person detection example:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
running locally with a GPU, either in a Jupyter Notebook or a regular python file. I get the error in the title either way.
I'm using Ubuntu 18.04. Here is a summary of the steps I've performed:
1) Stock Ubuntu 18.04 install on a Lenovo ThinkPad X1 Extreme Gen 2 with a GTX 1650 GPU.
2) Perform a standard CUDA 10.0 / cuDNN 7.4 install. I'd rather not restate all the steps as this post is going to be more than long enough already. This is a standard procedure, pretty much any link found via googling is what I followed.
3) Install torch
and torchvision
pip3 install torch torchvision
4) From this link on the PyTorch site:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
I've both saved the linked notebook:
https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb
And Also tried the link at the bottom that has the regular Python file:
https://pytorch.org/tutorials/_static/tv-training-code.py
5) Before running either the notebook or the regular Python way, I did the following (found at the top of the above linked notebook):
Install the CoCo API into Python:
cd ~
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
open Makefile in gedit, change the two instances of "python" to "python3", then:
python3 setup.py build_ext --inplace
sudo python3 setup.py install
Get the necessary files the above linked files need to run:
cd ~
git clone https://github.com/pytorch/vision.git
cd vision
git checkout v0.5.0
from ~/vision/references/detection
, copy coco_eval.py
, coco_utils.py
, engine.py
, transforms.py
, and utils.py
to whichever directory the above linked notebook or tv-training-code.py
file are being ran from.
6) Download the Penn Fudan Pedestrian dataset from the link on the above page:
https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip
then unzip and put in the same directory as the notebook or tv-training-code.py
In case the above link ever breaks or just for easier reference, here is tv-training-code.py
as I have downloaded it at this time:
# Sample code from the TorchVision 0.3 Object Detection Finetuning Tutorial
# http://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
import os
import numpy as np
import torch
from PIL import Image
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from engine import train_one_epoch, evaluate
import utils
import transforms as T
class PennFudanDataset(object):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
def get_model_instance_segmentation(num_classes):
# load an instance segmentation model pre-trained pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer,
num_classes)
return model
def get_transform(train):
transforms = []
transforms.append(T.ToTensor())
if train:
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
def main():
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# our dataset has two classes only - background and person
num_classes = 2
# use our dataset and defined transformations
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))
# split the dataset in train and test set
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
# get the model using our helper function
model = get_model_instance_segmentation(num_classes)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
# let's train it for 10 epochs
num_epochs = 10
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
print("That's it!")
if __name__ == "__main__":
main()
Here is an exmaple run of tv-training-code.py
$ python3 tv-training-code.py
Epoch: [0] [ 0/60] eta: 0:01:17 lr: 0.000090 loss: 4.1717 (4.1717) loss_classifier: 0.8903 (0.8903) loss_box_reg: 0.1379 (0.1379) loss_mask: 3.0632 (3.0632) loss_objectness: 0.0700 (0.0700) loss_rpn_box_reg: 0.0104 (0.0104) time: 1.2864 data: 0.1173 max mem: 1865
Traceback (most recent call last):
File "tv-training-code.py", line 165, in <module>
main()
File "tv-training-code.py", line 156, in main
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
File "/xxx/PennFudanExample/engine.py", line 46, in train_one_epoch
losses.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 189, in wrapper
outputs = fn(ctx, *args)
File "/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py", line 38, in backward
output_size[0], output_size[1], bs, ch, h, w, sampling_ratio)
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fdfb6c9b813 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1ce68 (0x7fdfb6edce68 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1de6e (0x7fdfb6edde6e in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #3: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0x279 (0x7fdf59472789 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[many more frame lines omitted]
Clearly the line:
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
is the critical error.
If I run an nvidia-smi before a run:
$ nvidia-smi
Tue Dec 24 14:32:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 On | N/A |
| N/A 47C P8 5W / N/A | 296MiB / 3903MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1190 G /usr/lib/xorg/Xorg 142MiB |
| 0 1830 G /usr/bin/gnome-shell 72MiB |
| 0 3711 G ...uest-channel-token=14371934934688572948 78MiB |
+-----------------------------------------------------------------------------+
It seems pretty clear there is plenty of GPU memory available (this GPU is 4GB).
Moreover, I'm confident my CUDA/cuDNN install and GPU hardware are good b/c I train and inference the TensorFlow object detection API on this computer frequently, and as long as I use the allow_growth
option I never have GPU related errors.
From Googling on this error it seems to be relatively common. The most common solutions are:
1) Try a smaller batch size (not really applicable in this case since the training and testing batch sizes are 2 and 1 respectively, and I tried with 1 and 1 and still got the same error)
2) Update to the latest version of PyTorch (but I'm already at the latest version).
Some other suggestions involve reworking the training script. I'm very familiar with TensorFlow but I'm new to PyTorch so I'm not sure how to go about that. Also, most of the rework suggestions I can find for this error do not pertain to object detection and therefore I'm not able to relate them to this training script specifically.
Has anybody else gotten this script to run locally with an NVIDIA GPU? Do you suspect a OS/CUDA/PyTorch configuration concern, or is there someway the script can be reworked to prevent this error? Any assistance would be greatly appreciated.
回答1:
Very strange, after changing both the training and testing batch size to 1, it now does not crash with a GPU error. Very strange since I'm certain I tried this before.
Perhaps it had something to do with changing the batch size to 1 for both training and testing, and then rebooting or somehow refreshing something else? I'm not really sure. Very odd.
Now the evaluate
function call is crashing with the error:
object of type <class 'numpy.float64'> cannot be safely interpreted as an integer.
But it seems this is completely unrelated so I'll make a separate post for that.
来源:https://stackoverflow.com/questions/59473949/pytorch-object-detection-with-gpu-on-ubuntu-18-04-runtimeerror-cuda-out-of-me