深度学习框架PyTorch的技巧总结

邮差的信 提交于 2021-02-02 10:44:02

1.在训练模型时指定GPU的编号

  1. 设置当前使用的GPU设备仅为0号设备,设备名称为"/gpu:0",os.environ["CUDA_VISIBLE_DEVICES"]="0";
  2. 设置当前使用的GPU设备为0,1两个设备,名称依次为"/gpu:0","/gpu:1",os.environ["CUDA_VISIBLE_DEVICES"]="0,1";根据顺序优先表示使用0号设备,然后使用1号设备;
  3. 同样,也可以在训练脚本外面指定,CUDA_VISIBLE_DEVICES=0,1 python train.py,注意,如果此时使用的是8卡中的6和7,CUDA_VISIBLE_DEVICES=6,7 python train.py,但是在模型并行化的时候,仍然指定0和1,model=nn.DataParallel(mode, devices=[0,1];
    在这里,需要注意的是,指定GPU的命令需要放在和网络模型操作的最前面;

2.查看模型每层的输如输出详情

  • 1.需要安装torchsummary或者torchsummaryX(pip install torchsummary);
  • 2.使用示例如下:
from torchvision import models

vgg16 = models.vgg16()
vgg16 = vgg16.cuda()

# 1.torchsummary使用方法
from torchsummary import summary
summary(vgg16, (3, 224, 224))    # (3, 224, 224)是网络模型的输入尺寸

# 2.torchsummaryX使用方法
from torchsummaryX import summary as summaryX

inputx = torch.randn(1, 3, 224, 224)
summaryX(vgg16, inputx)  

输出的结果如下图所示(每层输出的shape以及模型的计算量):
输出结果

3.梯度裁剪:防止在模型优化过程中出现梯度爆炸或者弥散

import torch
import torch.nn as nn

...
outputx = model(inputx)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()

nn.utils.clip_grad_norm_的参数:

  1. parameters:基于变量的迭代器,会进行梯度归一化;
  2. max_norm:梯度的最大范数;
  3. norm_type:规定范数的类型,默认为L2;
  4. 需要注意的是,梯度裁剪在某些任务上会额外消耗大量的计算时间。

4.扩张单张图片的维度

因为在模型训练的时候,输入数据的维度是(batch_size,c,h,w),而在测试的时候是单张图片(c,h,w),所以会需要进行维度扩张

import cv2
import torch
import numpy as np
    
####### 基于numpy的方法 #########
# 方法1.
image = cv2.imread(imgpath)
print(image.shape)
image = image[np.newaxis, :, :, :]
print(image.shape)   

####### 基于pytorch的方法 #########
# 方法2.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.view(1, *image.shape)
print(image.shape)

# 方法3.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.unsqueeze(dim=0)
print(image.shape)

tensor.unsqueeze(dim):扩展维度,dim指定扩展哪个维度;tensor.squeeze(dim):去除dim指定的且size为1的维度,当维度都大于1时,seqeeze()不起作用,不指定dim时,去除所有size为1的维度。

5.one-hot编码

在PyTorch里面的定义的交叉熵的时候,会自动把label转换成one-hot编码,所以不需要手动转换,而使用MSE需要手动转换成one-hot编码,以下是转换示例:

import torch
class_num = 8
batch_size = 4

def one_hot(label):
	"""
	Convert the label of one division to one-hot
	Argument:
		label: (type, tensor), the gt label, shape: (batch_size,)
	Return:
		one_hot_out: (type, tensor), the one-hot label, shape: (batch_size, class_num)
	"""
	label = label.resize_(batch_size, 1)
	m_zeros = torch.zeros(batch_size, class_num)
	one_hot_out = m_zeros.scatter_(1, label, 1)    # (dim, index, value)
	return one_hot_out

label = torch.LongTensor(batch_size).random_() % class_num
print(one_hot(label))

在PyTorch1.1之后,one_hot函数可以直接调用torch.nn.functional.one_hot

import torch
import torch.nn.functional as F

tensor = torch.arange(0, 5) % 3
one_hot = F.one_hot(tensor)

# F.one_hot会检测不同类别的个数,生成对应的one-hot,也可以自己定义类别数
one_hot = F.one_hot(tensor, num_classes=10)

6.在验证模型时,防止显存爆炸

在验证模型的过程中是不需要求导,既不需要梯度计算,关闭autograd,可以提高速度,节约内存,如果不关闭可能会爆显存:

with torch.no_grad():
	model.eval()

7.学习率的衰减策略

在模型的训练过程中动态地调整学习率,避免陷入局部优化点。

import torch
import torch.optim as optim
from torch.optim import lr_scheduler

# init optimier
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, 10, 0.1)     # 每隔10个epoch,学习率乘以0.1

# train process
for n in n_epoch:
	scheduler.step()
...

8.训练过程中冻结某些层的参数

当加载预训练模型的时候,或者在迁移学习中的分类模型,需要冻结前面几层,保证其features不动,使其在训练过程中不发生变化。

from torchvision import models

net = models.vgg16()
for name, value in net.named_parameters():
	print('name: {0}, \t grad: {1}'.format(name, value.requires_grad)
    
no_grad = ['cnn.VGG_16.convolution1_1.weight', 
            'cnn.VGG_16.convolution1_1.bias'
          ]
   
for name, value in net.named_parameters():
    if name in no_grad:
        value.requires_grad = False
    else:
        value.requires_grad = True
            
# 定义优化器
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)

9.训练过程中针对不同的层设置不同的学习率

根据模型在优化过程中,会根据需要,对不同的层,设置不同的的学习率,代码如下:

from torchvision import models

net = models.vgg16()
for name, value in net.named_parameters():
	print('name: {}'.format(name)
    
# split the layer according to the key words,
# feature layers:finetune,classifiery layers:from scratch
conv_params = []
fc_params = []
for name, params in net.named_parameters():
	if 'conv' in name:
    	conv_params += [params]
    else:
    	fc_params += [params]
        
# define the optimizer
optimizer = optim.Adam([
            	{
   
   'params': conv_params, 'lr': 1e-4}, 
                {
   
   'params': fc_params, 'lr': 1e-2}], weight_decay=1e-3)

将模型层划分为两部分,存放于一个列表中,每个部分就对应上面的一个字典,在字典里设置不同的学习率。当这两部分有相同的其他参数时,就将该参数放到列表外面作为全局参数,就像上面的’weight_decay’。也可以在列表外面设置一个全局学习率,当各个部分字典里设置了局部学习率时,就使用该学习率,否则就使用列表外面的全局学习率optimizer = optim.Adam([{'params': conv_params, 'lr': 1e-4}], lr=1e-2, weight_decay=1e-3)

10.模型的保存和加载方式

在模型的训练过程中需要对模型进行保存,使用模型的时候需要加载训练好的模型。Pytorch中保存和加载模型的主要分为两类:1. 保存加载整个模型;2. 只保存加载模型参数;

1.保存加载模型基本用法

  1. 保存加载整个模型(网络结构+模型的参数,比较耗时)
# save model
torch.save(model, 'net.pkl')

# load model 
model = torch.load('net.pkl')     # the model must have be defined
  1. 只保存加载模型参数(速度快,占内存少,推荐方法)
# save model parameters
torch.save(model.state_dict(), 'net_params.pkl'

# load model parameters, must build model firstly, load parameters secondly
model = Net()
state_dict = torch.load('net_params.pkl')
model.load_state_dict(state_dict)

2.保存加载自定义模型

上面保存的net.pkl文件其实是一个字典,通常包括以下内容: a.网络结构:输入尺寸,输出尺寸以及隐含层信息,以便能够在加载时重建模型; b.模型的权重参数:包括各个网络层训练后的可学习参数,可以在模型实例上调用state_dict()方法来获取,比如只保存模型权重参数时用到的model.state_dict(); c.优化器参数:有时候保存模型之后需要接着训练,那么就必须保存优化器的状态和所使用的超参数,也就是在优化器实例上调用state_dict()方法来获取这些参数; d.其他信息:有时候需要保存其他信息,比如epoch,batch_size等超参数。 这样就可以自定义需要保存的内容,如下所示。

# saving a checkpoint assuming the network class named Net
checkpoint = {
   
   
    'model':Net(), 
    'model_state_dict':model.state_dict(), 
    'optimizer_state_dict':optimizer.state_dict(),
    'epoch':epoch
}

torch.save(chekpoint, 'checkpoint.pkl')

# load the model infor
def load_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    model = checkpoint['model']     # 网络结构
    model.load_state_dict(checkpoint['model_state_dict'])    # 加载网络模型参数
    optimizer = optim.SGD()
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])    # 加载优化器参数

    for params in model.parameters():
        params.requires_grad = False
    
    model.eval()
    return model 

model = load_checkpoint('checkpoint.pkl')

加载模型是为了进行测试,则将每一层的requires_grad置为False,固定这些参数;还需要调用model.eval()将模型置为测试模式,主要是将DropoutBatchNormalization进行固定,否则模型的预测结果每次都会不同。如果继续训练,则调用model.train()确保网络模型处于训练模式。

3.跨设备保存加载模型

  1. 在GPU上训练的模型,在CPU上加载(Save on GPU, Load on CPU):

    device = torch.device('cpu')
    model = Net()
    # load all tensors onto the CPU device
    model.load_state_dict(torch.load('net_params.pkl', map_location=device))
    # <===> model.load_state_dict(torch.load('net_params.pkl', map_location='cpu'))
    
  2. 在GPU上训练的模型,在GPU上加载(Save on GPU, Load on GPU):

    device = torch.device('cuda')
    model = Net()
    model.load_state_dict(torch.load('net_params.pkl'))
    model.to(device)
    

在这里使用map_location参数不起作用,要使用model.to(torch.device("cuda"))将模型转换为CUDA优化的模型。

还需要对将输入模型的数据调用data=data.to(device),即将数据从CPU转到GPU。注意,调用my_tensor.to(device)会返回一个my_tensor在GPU上的副本,它不会覆盖my_tensor。因此需要手动覆盖张量:my_tensor = my_tensor.to(device)

  1. 在CPU上训练的模型,在GPU上加载(Save on CPU, Load on GPU):

    device = torch.device('cuda')
    model = Net()
    model.load_state_dict(torch.load('net_params.pkl', map_location='cuda:0'))
    model.to(device)
    

11.GPU相关的几个函数

# 判断cuda时候可用
print(torch.cuda.is_available()

# 获取gpu数量
print(torch.cuda.device_count()

# 获取gpu名字
print(torch.cuda.get_device_name(0))

# 获取当前gpu设备索引,默认从0开始
print(torch.cuda.current_device())

# 将模型和数据从cpu移到gpu
use_cuda = torch.cuda.is_available()

# 方法1
if use_cuda:
    data = data.cuda()
    model.cuda()

# 方法2
device = torch.device('cuda' if use_cuda else 'cpu')
data = data.to(device)
model.to(device)

12.打印模型在inference中的特征图

  1. 包装模型(在forward中输出特征图);
import os
import cv2
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models


class FeatureVisualizaiton:
    input_size = 256
    def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
        self.imgpath = imgpath
        self.layers_idx = layers_idx
        self.save_features_dir= save_features_dir
        self.net = models.vgg16()
    
    @staticmethod
    def preprocess_image(imgpath):
        assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
        img = cv2.imread(imgpath)
        # resize
        img = cv2.resize(img, (input_size, input_size))
        # normalize as [0, 1]
        img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :]   # (1, 3, 256, 256)
        # <===>
        # img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
        # img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img)
        return img
       
    def get_features(self):
        """Extract features"""
        features = {
   
   }
        inputx = self.preprocess_image(self.imgpath)
        print('inputx shape', inputx.shape)
        if torch.cuda.is_available():
            inputx = inputx.cuda()
            model = self.net.cuda()
            
        x = inputx 
        for index, (name, module) in enumerate(model.named_modules()):
            x = module(x)
            if index in self.layers_idx:
                features[name] = x
        return features
        
    def save_features(self):
        """Save features"""
        features = self.get_features()
        for name, feature in features.items():
            feature = self.process_feature(feature)
            cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
        
        
    @statcimethod
    def process_feature(feature):
        """
        Normalize the feature
        Arguments:
            feature: (type, tensor(b, c, h, w)), normalize to (0, 255) 
        """
        feature = feature.cpu().detach().numpy()
        
        # use sigmoid to [0, 1]
        feature = (1.0 / (1 + np.exp(-1 * feature))
        feature = np.round(feature * 255)
        return feature

if __name__ == '__main__':
    featurevisualization = FeatureVisualization()
    featurevisualization.save_features()
  1. 使用hook:利用pytorch里面的hook,可以不改变输入输出中间的网络结构,可以方便的获取,改变网络中间层的值和梯度(几种hook和forward,backward的先后关系在nn.module__call__函数里面可以看得更清楚),可以看到,对于register_forward_hook在forward的调用之后。
import os
import cv2
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models


class FeatureVisualizaiton:
    input_size = 256
    def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
        self.imgpath = imgpath
        self.layers_idx = layers_idx
        self.save_features_dir= save_features_dir
        self.net = models.vgg16()
    
    @staticmethod
    def preprocess_image(imgpath):
        assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
        img = cv2.imread(imgpath)
        # resize
        img = cv2.resize(img, (input_size, input_size))
        # normalize as [0, 1]
        img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :]   # (1, 3, 256, 256)
        # <===>
        # img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
        # img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img)
        return img
       
    def get_features(self):
        """Extract features"""
        features = {
   
   }
        inputx = self.preprocess_image(self.imgpath)
        print('inputx shape', inputx.shape)
        if torch.cuda.is_available():
            inputx = inputx.cuda()
            model = self.net.cuda()
        
        # closure
        def get_activation(name):
            def hook(model, input, output):
                features[name] = output.detach()
            return hook
        
        # register hook
        for layer_idx in self.layers_idx:
            handle = model[layer_idx].register_forward_hook(get_activation(str(layer_idx))

        outputx = model(inputx)
        handle.remove()
        
        return features
        
    def save_features(self):
        """Save features"""
        features = self.get_features()
        for name, feature in features.items():
            feature = self.process_feature(feature)
            cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
        
    @statcimethod
    def process_feature(feature):
        """
        Normalize the feature
        Arguments:
            feature: (type, tensor(b, c, h, w)), normalize to (0, 255) 
        """
        feature = feature.cpu().detach().numpy()
        
        # use sigmoid to [0, 1]
        feature = (1.0 / (1 + np.exp(-1 * feature))
        feature = np.round(feature * 255)
        return feature

if __name__ == '__main__':
    featurevisualization = FeatureVisualization()
    featurevisualization.save_features()

13.Tensor类型之间的转换(三种方式)

  1. 使用独立函数:

    import torch
    import torch.nn as nn
        
    x = torch.randn(3, 5)
    print(x)
    # convert x as long
    x_long = x.long()
    # convert x as half
    x_half = x.half()
    # convert x as int 
    x_int = x.int()
    # convert x as double
    x_double = x.double()
    # convert x as float
    x_float = x.float()
    # convert x as char
    x_char = x.char()
    # convert x as byte
    x_byte = x.byte()
    # convert x as short
    x_short = x.short()
    
  2. 使用**torch.type()**函数:

    import torch
    import torch.nn as nn
        
    x = torch.randn(3, 5)
    x_int = x.type(torch.IntTensor)
    print(x_int)
    
  3. 使用**type_as(ano_tensor)**将tensor转换为给定类型的tensor:

    import torch
    import torch.nn as nn
        
    x = torch.FloatTensor(5)    
    y = torch.IntTensor([10, 20])
        
    x_int = x.type_as(y)
    assert isinstance(x_int, torch.IntTensor)
    

该文章总结了自己在pytorch使用过程中的一些小技术积累,后续会持续更新。如果有错误不当之处,欢迎各位大牛批评指正!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!