RuntimeError: "element 0 of tensors does not require grad" when using custom autograd function with create_graph=True

4 days ago 8

ARTICLE AD BOX

Here is a unique, high-quality Machine Learning question framed for Stack Overflow.

It focuses on a subtle but advanced issue in PyTorch: implementing a custom autograd.Function that breaks when computing second-order derivatives (commonly needed for techniques like Gradient Penalty in WGANs, Hessian-vector products, or Meta-Learning/MAML).

This question is designed to be "un-Googleable" at a glance but solvable by an expert, which usually garners high engagement.

Stack Overflow Question Draft Title: RuntimeError: "element 0 of tensors does not require grad" when using custom autograd function with create_graph=True

Tags: python pytorch autograd deep-learning gradient-descent

Body:

I am implementing a custom activation function (a variant of Swish) in PyTorch to optimize memory usage. I implemented it using torch.autograd.Function by defining both the forward and backward static methods.

Standard training (first-order derivatives) works perfectly. However, I am now trying to use this custom layer in a WGAN-GP (Wasserstein GAN with Gradient Penalty) setup. This requires computing the gradient of the gradients (double backprop) using torch.autograd.grad(..., create_graph=True).

As soon as I enable create_graph=True, I get the following error: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

What I have tried:

I verified that x.requires_grad is True.

I tried replacing my custom function with torch.nn.SiLU() (built-in Swish), and the double backward works perfectly, so the issue is definitely in my CustomSwish class.

I removed ctx.save_for_backward and recalculated inputs in backward, but the error persists.

Question: Why does my custom autograd.Function break the computation graph during the second backward pass, even though I am using differentiable PyTorch operations inside backward? How do I properly implement a custom function that supports higher-order derivatives?

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

RuntimeError: "element 0 of tensors does not require grad" when using custom autograd function with create_graph=True

ARTICLE AD BOX

Related

Jupyter Kernel stuck on "Connecting" in Docker (CellOracle) on macOS Silicon (M2)

Flask API pagination and filtering works, but totalRecords count seems incorrect

Submit job without having access to worker code

LEFT SIDEBAR AD