Advanced Automatic Differentiation#

Overview#

In this section, we review some advanced AD tricks, especially their application to circuit simulations. With these advanced AD tricks, we can evaluate some quantum quantities more efficiently.

The advanced AD is possible in TensorCircuit, as we have implemented several AD-related API in a backend agnostic way, the implementation of them closely follows the design philosophy of jax AD implementation.

Setup#

[1]:
import numpy as np
import tensorcircuit as tc

K = tc.set_backend("tensorflow")
[2]:
n = 6
nlayers = 3

Backend agnostic AD related APIs include the following:

[3]:
help(K.grad)
help(K.value_and_grad)
help(K.vectorized_value_and_grad)
help(K.vjp)
help(K.jvp)
help(K.jacfwd)
help(K.jacrev)
help(K.stop_gradient)
help(K.hessian)
Help on method grad in module tensorcircuit.backends.tensorflow_backend:

grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Any] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Return the function which is the grad function of input ``f``.

    :Example:

    >>> f = lambda x,y: x**2+2*y
    >>> g = tc.backend.grad(f)
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    2
    >>> g = tc.backend.grad(f, argnums=(0,1))
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    [2, 2]

    :param f: the function to be differentiated
    :type f: Callable[..., Any]
    :param argnums: the position of args in ``f`` that are to be differentiated, defaults to be 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: the grad function of ``f`` with the same set of arguments as ``f``
    :rtype: Callable[..., Any]

Help on method value_and_grad in module tensorcircuit.backends.tensorflow_backend:

value_and_grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Tuple[Any, Any]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Return the function which returns the value and grad of ``f``.

    :Example:

    >>> f = lambda x,y: x**2+2*y
    >>> g = tc.backend.value_and_grad(f)
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    5, 2
    >>> g = tc.backend.value_and_grad(f, argnums=(0,1))
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    5, [2, 2]

    :param f: the function to be differentiated
    :type f: Callable[..., Any]
    :param argnums: the position of args in ``f`` that are to be differentiated, defaults to be 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: the value and grad function of ``f`` with the same set of arguments as ``f``
    :rtype: Callable[..., Tuple[Any, Any]]

Help on method vectorized_value_and_grad in module tensorcircuit.backends.tensorflow_backend:

vectorized_value_and_grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, vectorized_argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Tuple[Any, Any]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Return the VVAG function of ``f``. The inputs for ``f`` is (args[0], args[1], args[2], ...),
    and the output of ``f`` is a scalar. Suppose VVAG(f) is a function with inputs in the form
    (vargs[0], args[1], args[2], ...), where vagrs[0] has one extra dimension than args[0] in the first axis
    and consistent with args[0] in shape for remaining dimensions, i.e. shape(vargs[0]) = [batch] + shape(args[0]).
    (We only cover cases where ``vectorized_argnums`` defaults to 0 here for demonstration).
    VVAG(f) returns a tuple as a value tensor with shape [batch, 1] and a gradient tuple with shape:
    ([batch]+shape(args[argnum]) for argnum in argnums). The gradient for argnums=k is defined as

    .. math::

        g^k = \frac{\partial \sum_{i\in batch} f(vargs[0][i], args[1], ...)}{\partial args[k]}

    Therefore, if argnums=0, the gradient is reduced to

    .. math::

        g^0_i = \frac{\partial f(vargs[0][i])}{\partial vargs[0][i]}

    , which is specifically suitable for batched VQE optimization, where args[0] is the circuit parameters.

    And if argnums=1, the gradient is like

    .. math::
        g^1_i = \frac{\partial \sum_j f(vargs[0][j], args[1])}{\partial args[1][i]}

    , which is suitable for quantum machine learning scenarios, where ``f`` is the loss function,
    args[0] corresponds to the input data and args[1] corresponds to the weights in the QML model.

    :param f: [description]
    :type f: Callable[..., Any]
    :param argnums: [description], defaults to 0
    :type argnums: Union[int, Sequence[int]], optional
    :param vectorized_argnums: the args to be vectorized, these arguments should share the same batch shape
        in the fist dimension
    :type vectorized_argnums: Union[int, Sequence[int]], defaults to 0
    :return: [description]
    :rtype: Callable[..., Tuple[Any, Any]]

Help on method vjp in module tensorcircuit.backends.tensorflow_backend:

vjp(f: Callable[..., Any], inputs: Union[Any, Sequence[Any]], v: Union[Any, Sequence[Any]]) -> Tuple[Union[Any, Sequence[Any]], Union[Any, Sequence[Any]]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Function that computes the dot product between a vector v and the Jacobian
    of the given function at the point given by the inputs. (reverse mode AD relevant)
    Strictly speaking, this function is value_and_vjp.

    :param f: the function to carry out vjp calculation
    :type f: Callable[..., Any]
    :param inputs: input for ``f``
    :type inputs: Union[Tensor, Sequence[Tensor]]
    :param v: value vector or gradient from downstream in reverse mode AD
        the same shape as return of function ``f``
    :type v: Union[Tensor, Sequence[Tensor]]
    :return: (``f(*inputs)``, vjp_tensor), where vjp_tensor is the same shape as inputs
    :rtype: Tuple[Union[Tensor, Sequence[Tensor]], Union[Tensor, Sequence[Tensor]]]

Help on method jvp in module tensorcircuit.backends.tensorflow_backend:

jvp(f: Callable[..., Any], inputs: Union[Any, Sequence[Any]], v: Union[Any, Sequence[Any]]) -> Tuple[Union[Any, Sequence[Any]], Union[Any, Sequence[Any]]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Function that computes a (forward-mode) Jacobian-vector product of ``f``.
    Strictly speaking, this function is value_and_jvp.

    :param f: The function to compute jvp
    :type f: Callable[..., Any]
    :param inputs: input for ``f``
    :type inputs: Union[Tensor, Sequence[Tensor]]
    :param v: tangents
    :type v: Union[Tensor, Sequence[Tensor]]
    :return: (``f(*inputs)``, jvp_tensor), where jvp_tensor is the same shape as the output of ``f``
    :rtype: Tuple[Union[Tensor, Sequence[Tensor]], Union[Tensor, Sequence[Tensor]]]

Help on method jacfwd in module tensorcircuit.backends.abstract_backend:

jacfwd(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Compute the Jacobian of ``f`` using the forward mode AD.

    :param f: the function whose Jacobian is required
    :type f: Callable[..., Any]
    :param argnums: the position of the arg as Jacobian input, defaults to 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: outer tuple for input args, inner tuple for outputs
    :rtype: Tensor

Help on method jacrev in module tensorcircuit.backends.abstract_backend:

jacrev(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Compute the Jacobian of ``f`` using reverse mode AD.

    :param f: The function whose Jacobian is required
    :type f: Callable[..., Any]
    :param argnums: the position of the arg as Jacobian input, defaults to 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: outer tuple for output, inner tuple for input args
    :rtype: Tensor

Help on method stop_gradient in module tensorcircuit.backends.tensorflow_backend:

stop_gradient(a: Any) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Stop backpropagation from ``a``.

    :param a: [description]
    :type a: Tensor
    :return: [description]
    :rtype: Tensor

Help on method hessian in module tensorcircuit.backends.abstract_backend:

hessian(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance

Forward AD#

Using the Jacobian vector product (jvp), we can compute the circuit gradient in the forward AD mode, which is more suitable when the number of output elements is much larger than the input.

Suppose we are going to evaluate \(\partial \vert \psi(\theta) \rangle\), where \(\psi(\theta) = U(\theta)\vert 0\rangle\) is the output state of some parameterized quantum circuit.

[4]:
def ansatz(thetas):
    c = tc.Circuit(n)
    for j in range(nlayers):
        for i in range(n):
            c.rx(i, theta=thetas[j])
        for i in range(n - 1):
            c.cnot(i, i + 1)
    return c


def psi(thetas):
    c = ansatz(thetas)
    return c.state()
[5]:
state, partial_psi_partial_theta0 = K.jvp(
    psi,
    K.implicit_randn([nlayers]),
    tc.array_to_tensor(np.array([1.0, 0, 0]), dtype="float32"),
)

We thus obtain \(\frac{\partial \psi}{\partial \theta_0}\), since the tangent takes one in the first place and zero in other positions.

[6]:
state.shape, partial_psi_partial_theta0.shape
[6]:
(TensorShape([64]), TensorShape([64]))

Jacobian#

We can compute the Jacobian row by row or col by col using vmap together with reverse mode or forward mode AD.

We still use the above example, to calculate Jacobian \(J_{ij}=\frac{\partial \psi_i}{\partial \theta_j}\).

[7]:
thetas = K.implicit_randn([nlayers])

jac_fun = K.jit(K.jacfwd(psi))

jac_value = jac_fun(thetas)
[8]:
%timeit jac_fun(thetas)
601 µs ± 36.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[9]:
jac_value.shape
[9]:
TensorShape([64, 3])

We can also use reverse mode AD to obtain Jacobian.

[10]:
jac_fun2 = K.jit(K.jacrev(psi))

jac_value2 = jac_fun2(thetas)
[11]:
%timeit jac_fun2(thetas)
843 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[12]:
jac_value2.shape
[12]:
TensorShape([64, 3])
[13]:
np.testing.assert_allclose(np.real(jac_value), jac_value2, atol=1e-5)

It is worth noting that forward mode AD Jacobian is faster since the result Jacobian is a tall matrix.

Quantum Fisher Information#

Quantum Fisher Information is a very important quantity in quantum computation, which can be utilized in so-called quantum natural gradient descent optimization as well as variational quantum dynamics. See reference paper for more details.

There are several variants of QFI like object, and the core to evaluate is all related to \(\langle \partial \psi \vert \partial \psi\rangle - \langle \partial \psi \vert \psi\rangle\langle \psi \vert \partial \psi\rangle\). Such quantity is easily obtained with an advanced AD framework by first getting the Jacobian for the state and then vmap the inner product over Jacobian rows. The detailed implementation can be found in the codebase tensorcircuit/experimental.py. We directly call the corresponding API in this note.

[14]:
from tensorcircuit.experimental import qng
[15]:
qfi_fun = K.jit(qng(psi))
qfi_value = qfi_fun(thetas)
qfi_value.shape
WARNING:tensorflow:The dtype of the watched primal must be floating (e.g. tf.float32), got tf.complex64
[15]:
TensorShape([3, 3])
[16]:
%timeit qfi_fun(thetas) # the speed is comparable with a simple Jacobian computation
609 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Hessian#

Hessian is defined as \(\partial_{ij} \langle \psi(\theta)\vert H\vert \psi(\theta)\rangle\), where \(ij\) is shorthand for \(\theta_i\theta_j\).

In the following examples, we use \(H=Z_0\) for simplicity.

[17]:
def h(thetas):
    c = ansatz(thetas)
    return K.real(c.expectation_ps(z=[0]))


hess_f = K.jit(K.hessian(h))
[18]:
hess_value = hess_f(thetas)
hess_value.shape
[18]:
TensorShape([3, 3])

\(\langle \psi \vert H \vert \partial \psi \rangle\)#

This quantity is very common as the RHS of the variational quantum dynamics equation. And there is no good way to compute this quantity besides constructing a corresponding Hadamard test circuit.

However, we can easily obtain this in the AD framework, as long as the stop_gradint API exists, which is the case for TensorCircuit. Namely, this quantity is obtained as \(\partial (\langle \psi \vert H\vert \bot(\psi)\rangle)\), where the outside \(\partial\) is automatically implemented by AD and \(\bot\) is for stop_gradient op which stop the backpropagation.

[19]:
z0 = tc.quantum.PauliStringSum2Dense([[3, 0, 0, 0, 0, 0]])


def h(thetas):
    w = psi(thetas)
    wr = K.stop_gradient(w)
    wl = K.conj(w)
    wl = K.reshape(wl, [1, -1])
    wr = K.reshape(wr, [-1, 1])
    e = wl @ z0 @ wr
    return K.real(e)[0, 0]
[20]:
psi_h_partial_psi = K.grad(h)(thetas)
psi_h_partial_psi.shape
[20]:
TensorShape([3])