高级自动微分#

概述#

在本节中,我们将回顾一些高级 AD 技巧,尤其是它们在电路模拟中的应用。借助这些高级的 AD 技巧,我们可以更高效地评估一些量子量。

高级 AD 在 TensorCircuit 中是可能的,因为我们已经以后端不可知的方式实现了几个与 AD 相关的 API,它们的实现紧密遵循 JAX AD 实现的设计理念。

设置#

[1]:
import numpy as np
import tensorcircuit as tc

K = tc.set_backend("tensorflow")
[2]:
n = 6
nlayers = 3

与后端无关的 AD 相关 API 包括以下:

[3]:
help(K.grad)
help(K.value_and_grad)
help(K.vectorized_value_and_grad)
help(K.vjp)
help(K.jvp)
help(K.jacfwd)
help(K.jacrev)
help(K.stop_gradient)
help(K.hessian)
Help on method grad in module tensorcircuit.backends.tensorflow_backend:

grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Any] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Return the function which is the grad function of input ``f``.

    :Example:

    >>> f = lambda x,y: x**2+2*y
    >>> g = tc.backend.grad(f)
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    2
    >>> g = tc.backend.grad(f, argnums=(0,1))
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    [2, 2]

    :param f: the function to be differentiated
    :type f: Callable[..., Any]
    :param argnums: the position of args in ``f`` that are to be differentiated, defaults to be 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: the grad function of ``f`` with the same set of arguments as ``f``
    :rtype: Callable[..., Any]

Help on method value_and_grad in module tensorcircuit.backends.tensorflow_backend:

value_and_grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Tuple[Any, Any]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Return the function which returns the value and grad of ``f``.

    :Example:

    >>> f = lambda x,y: x**2+2*y
    >>> g = tc.backend.value_and_grad(f)
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    5, 2
    >>> g = tc.backend.value_and_grad(f, argnums=(0,1))
    >>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
    5, [2, 2]

    :param f: the function to be differentiated
    :type f: Callable[..., Any]
    :param argnums: the position of args in ``f`` that are to be differentiated, defaults to be 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: the value and grad function of ``f`` with the same set of arguments as ``f``
    :rtype: Callable[..., Tuple[Any, Any]]

Help on method vectorized_value_and_grad in module tensorcircuit.backends.tensorflow_backend:

vectorized_value_and_grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, vectorized_argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Tuple[Any, Any]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Return the VVAG function of ``f``. The inputs for ``f`` is (args[0], args[1], args[2], ...),
    and the output of ``f`` is a scalar. Suppose VVAG(f) is a function with inputs in the form
    (vargs[0], args[1], args[2], ...), where vagrs[0] has one extra dimension than args[0] in the first axis
    and consistent with args[0] in shape for remaining dimensions, i.e. shape(vargs[0]) = [batch] + shape(args[0]).
    (We only cover cases where ``vectorized_argnums`` defaults to 0 here for demonstration).
    VVAG(f) returns a tuple as a value tensor with shape [batch, 1] and a gradient tuple with shape:
    ([batch]+shape(args[argnum]) for argnum in argnums). The gradient for argnums=k is defined as

    .. math::

        g^k = \frac{\partial \sum_{i\in batch} f(vargs[0][i], args[1], ...)}{\partial args[k]}

    Therefore, if argnums=0, the gradient is reduced to

    .. math::

        g^0_i = \frac{\partial f(vargs[0][i])}{\partial vargs[0][i]}

    , which is specifically suitable for batched VQE optimization, where args[0] is the circuit parameters.

    And if argnums=1, the gradient is like

    .. math::
        g^1_i = \frac{\partial \sum_j f(vargs[0][j], args[1])}{\partial args[1][i]}

    , which is suitable for quantum machine learning scenarios, where ``f`` is the loss function,
    args[0] corresponds to the input data and args[1] corresponds to the weights in the QML model.

    :param f: [description]
    :type f: Callable[..., Any]
    :param argnums: [description], defaults to 0
    :type argnums: Union[int, Sequence[int]], optional
    :param vectorized_argnums: the args to be vectorized, these arguments should share the same batch shape
        in the fist dimension
    :type vectorized_argnums: Union[int, Sequence[int]], defaults to 0
    :return: [description]
    :rtype: Callable[..., Tuple[Any, Any]]

Help on method vjp in module tensorcircuit.backends.tensorflow_backend:

vjp(f: Callable[..., Any], inputs: Union[Any, Sequence[Any]], v: Union[Any, Sequence[Any]]) -> Tuple[Union[Any, Sequence[Any]], Union[Any, Sequence[Any]]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Function that computes the dot product between a vector v and the Jacobian
    of the given function at the point given by the inputs. (reverse mode AD relevant)
    Strictly speaking, this function is value_and_vjp.

    :param f: the function to carry out vjp calculation
    :type f: Callable[..., Any]
    :param inputs: input for ``f``
    :type inputs: Union[Tensor, Sequence[Tensor]]
    :param v: value vector or gradient from downstream in reverse mode AD
        the same shape as return of function ``f``
    :type v: Union[Tensor, Sequence[Tensor]]
    :return: (``f(*inputs)``, vjp_tensor), where vjp_tensor is the same shape as inputs
    :rtype: Tuple[Union[Tensor, Sequence[Tensor]], Union[Tensor, Sequence[Tensor]]]

Help on method jvp in module tensorcircuit.backends.tensorflow_backend:

jvp(f: Callable[..., Any], inputs: Union[Any, Sequence[Any]], v: Union[Any, Sequence[Any]]) -> Tuple[Union[Any, Sequence[Any]], Union[Any, Sequence[Any]]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Function that computes a (forward-mode) Jacobian-vector product of ``f``.
    Strictly speaking, this function is value_and_jvp.

    :param f: The function to compute jvp
    :type f: Callable[..., Any]
    :param inputs: input for ``f``
    :type inputs: Union[Tensor, Sequence[Tensor]]
    :param v: tangents
    :type v: Union[Tensor, Sequence[Tensor]]
    :return: (``f(*inputs)``, jvp_tensor), where jvp_tensor is the same shape as the output of ``f``
    :rtype: Tuple[Union[Tensor, Sequence[Tensor]], Union[Tensor, Sequence[Tensor]]]

Help on method jacfwd in module tensorcircuit.backends.abstract_backend:

jacfwd(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Compute the Jacobian of ``f`` using the forward mode AD.

    :param f: the function whose Jacobian is required
    :type f: Callable[..., Any]
    :param argnums: the position of the arg as Jacobian input, defaults to 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: outer tuple for input args, inner tuple for outputs
    :rtype: Tensor

Help on method jacrev in module tensorcircuit.backends.abstract_backend:

jacrev(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Compute the Jacobian of ``f`` using reverse mode AD.

    :param f: The function whose Jacobian is required
    :type f: Callable[..., Any]
    :param argnums: the position of the arg as Jacobian input, defaults to 0
    :type argnums: Union[int, Sequence[int]], optional
    :return: outer tuple for output, inner tuple for input args
    :rtype: Tensor

Help on method stop_gradient in module tensorcircuit.backends.tensorflow_backend:

stop_gradient(a: Any) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
    Stop backpropagation from ``a``.

    :param a: [description]
    :type a: Tensor
    :return: [description]
    :rtype: Tensor

Help on method hessian in module tensorcircuit.backends.abstract_backend:

hessian(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance

前向 AD#

使用雅可比向量积(jvp),我们可以计算前向 AD 模式下的电路梯度,这在输出元素的数量远大于输入的情况下更合适。

假设我们要计算 \(\partial \vert \psi(\theta) \rangle\),其中 \(\psi(\theta) = U(\theta)\vert 0\rangle\) 是某个参数化量子电路的输出状态。

[4]:
def ansatz(thetas):
    c = tc.Circuit(n)
    for j in range(nlayers):
        for i in range(n):
            c.rx(i, theta=thetas[j])
        for i in range(n - 1):
            c.cnot(i, i + 1)
    return c


def psi(thetas):
    c = ansatz(thetas)
    return c.state()
[5]:
state, partial_psi_partial_theta0 = K.jvp(
    psi,
    K.implicit_randn([nlayers]),
    tc.array_to_tensor(np.array([1.0, 0, 0]), dtype="float32"),
)

因此我们得到 \(\frac{\partial \psi}{\partial \theta_0}\),因为切向量在第一个位置取 1,在其他位置取 0。

[6]:
state.shape, partial_psi_partial_theta0.shape
[6]:
(TensorShape([64]), TensorShape([64]))

Jacobian#

我们可以使用 vmap 和反向模式或前向模式 AD 逐行或逐列计算 Jacobian 行列式。

我们仍然使用上面的例子,来计算 Jacobian \(J_{ij}=\frac{\partial \psi_i}{\partial \theta_j}\)

[7]:
thetas = K.implicit_randn([nlayers])

jac_fun = K.jit(K.jacfwd(psi))

jac_value = jac_fun(thetas)
[8]:
%timeit jac_fun(thetas)
601 µs ± 36.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[9]:
jac_value.shape
[9]:
TensorShape([64, 3])

我们也可以使用反向模式 AD 来获得 Jacobian。

[10]:
jac_fun2 = K.jit(K.jacrev(psi))

jac_value2 = jac_fun2(thetas)
[11]:
%timeit jac_fun2(thetas)
843 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[12]:
jac_value2.shape
[12]:
TensorShape([64, 3])
[13]:
np.testing.assert_allclose(np.real(jac_value), jac_value2, atol=1e-5)

值得注意的是,前向模式 AD Jacobian 更快,因为结果 Jacobian 是一个高矩阵。

量子费舍尔信息#

量子Fisher信息是量子计算中一个非常重要的量,可用于所谓的量子自然梯度下降优化以及变分量子动力学。 有关详细信息,请参阅 参考论文

类 QFI 的对象有多种变体,要评估的核心都与 \(\langle \partial \psi \vert \partial \psi\rangle - \langle \partial \psi \vert \psi\rangle\langle \psi \vert \partial \psi\rangle\) 有关。 使用高级 AD 框架很容易获得这样的数量,方法是首先获取状态的 Jacobian,然后在 Jacobian 行上进行 vmap 内积。 详细的实现可以在代码库 tensorcircuit/experimental.py 中找到。我们在本笔记中直接调用对应的 API。

[14]:
from tensorcircuit.experimental import qng
[15]:
qfi_fun = K.jit(qng(psi))
qfi_value = qfi_fun(thetas)
qfi_value.shape
WARNING:tensorflow:The dtype of the watched primal must be floating (e.g. tf.float32), got tf.complex64
[15]:
TensorShape([3, 3])
[16]:
%timeit qfi_fun(thetas) # 速度与简单的 Jacobian 计算相当
609 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Hessian#

Hessian 定义为 \(\partial_{ij} \langle \psi(\theta)\vert H\vert \psi(\theta)\rangle\),其中 \(ij\)\(\theta_i\theta_j\) 的简写。

在以下示例中,为简单起见,我们使用 \(H=Z_0\)

[17]:
def h(thetas):
    c = ansatz(thetas)
    return K.real(c.expectation_ps(z=[0]))


hess_f = K.jit(K.hessian(h))
[18]:
hess_value = hess_f(thetas)
hess_value.shape
[18]:
TensorShape([3, 3])

\(\langle \psi \vert H \vert \partial \psi \rangle\)#

这个量作为变分量子动力学方程的 RHS 非常常见。 除了构建相应的 Hadamard 测试电路外,没有很好的方法来计算这个量。

但是,只要存在 stop_gradint API,我们就可以在 AD 框架中轻松获取,TensorCircuit 就是这种情况。 即这个量得到 \(\partial (\langle \psi \vert H\vert \bot(\psi)\rangle)\),其中外部 \(\partial\) 由 AD 自动实现, \(\bot\) 为停止反向传播的 stop_gradient 操作。

[19]:
z0 = tc.quantum.PauliStringSum2Dense([[3, 0, 0, 0, 0, 0]])


def h(thetas):
    w = psi(thetas)
    wr = K.stop_gradient(w)
    wl = K.conj(w)
    wl = K.reshape(wl, [1, -1])
    wr = K.reshape(wr, [-1, 1])
    e = wl @ z0 @ wr
    return K.real(e)[0, 0]
[20]:
psi_h_partial_psi = K.grad(h)(thetas)
psi_h_partial_psi.shape
[20]:
TensorShape([3])