Pytorch: inplace operation runtimeError

发布时间：2023-01-08 21:30

问题描述：

在PyTorch框架下搭建简单的网络结构并训练时，报以下错误：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [100, 300]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

这个错误产生的原因是，和loss反向传播梯度计算有关的变量在反向传播前通过inplace操作被修改了。inplace操作包括如：x += 1、x[0] = 1、torch.add_()等。

出现这个问题可以检查下：

1、forward(）函数中，是否在输出计算后又修改了和输出计算有关的变量。

因为输出和loss的计算有关，在通过某些变量计算得到输出后，就不能再通过inplace操作修改这些变量了。（可以通过.clone()或者另赋一个变量来修改）

比如：

   代码例来自：https://github.com/pytorch/pytorch/issues/15803
    b = a ** 2 * c ** 2
    b += 1
    b *= c + a

    d = b.exp_()
    d *= 5

    b.backward()

上面代码会在b.exp_()处报错。

2、在train()训练函数中，查看误差反向传播和参数更新先后顺序是否弄反。

要先进行误差反向传播，再参数更新。不然也会报同样的错误。

正确的顺序应为：

            loss.backward() #先反向传播再更新参数
            optimizer.step()

参考：
https://github.com/pytorch/pytorch/issues/15803

https://discuss.pytorch.org/t/solved-pytorch1-5-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/90256

https://zhuanlan.zhihu.com/p/38475183

Pytorch: inplace operation runtimeError

问题描述：

相关推荐