高级自动微分#
概述#
在本节中,我们将回顾一些高级 AD 技巧,尤其是它们在电路模拟中的应用。借助这些高级的 AD 技巧,我们可以更高效地评估一些量子量。
高级 AD 在 TensorCircuit 中是可能的,因为我们已经以后端不可知的方式实现了几个与 AD 相关的 API,它们的实现紧密遵循 JAX AD 实现的设计理念。
设置#
[1]:
import numpy as np
import tensorcircuit as tc
K = tc.set_backend("tensorflow")
[2]:
n = 6
nlayers = 3
与后端无关的 AD 相关 API 包括以下:
[3]:
help(K.grad)
help(K.value_and_grad)
help(K.vectorized_value_and_grad)
help(K.vjp)
help(K.jvp)
help(K.jacfwd)
help(K.jacrev)
help(K.stop_gradient)
help(K.hessian)
Help on method grad in module tensorcircuit.backends.tensorflow_backend:
grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Any] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Return the function which is the grad function of input ``f``.
:Example:
>>> f = lambda x,y: x**2+2*y
>>> g = tc.backend.grad(f)
>>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
2
>>> g = tc.backend.grad(f, argnums=(0,1))
>>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
[2, 2]
:param f: the function to be differentiated
:type f: Callable[..., Any]
:param argnums: the position of args in ``f`` that are to be differentiated, defaults to be 0
:type argnums: Union[int, Sequence[int]], optional
:return: the grad function of ``f`` with the same set of arguments as ``f``
:rtype: Callable[..., Any]
Help on method value_and_grad in module tensorcircuit.backends.tensorflow_backend:
value_and_grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Tuple[Any, Any]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Return the function which returns the value and grad of ``f``.
:Example:
>>> f = lambda x,y: x**2+2*y
>>> g = tc.backend.value_and_grad(f)
>>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
5, 2
>>> g = tc.backend.value_and_grad(f, argnums=(0,1))
>>> g(tc.num_to_tensor(1),tc.num_to_tensor(2))
5, [2, 2]
:param f: the function to be differentiated
:type f: Callable[..., Any]
:param argnums: the position of args in ``f`` that are to be differentiated, defaults to be 0
:type argnums: Union[int, Sequence[int]], optional
:return: the value and grad function of ``f`` with the same set of arguments as ``f``
:rtype: Callable[..., Tuple[Any, Any]]
Help on method vectorized_value_and_grad in module tensorcircuit.backends.tensorflow_backend:
vectorized_value_and_grad(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0, vectorized_argnums: Union[int, Sequence[int]] = 0, has_aux: bool = False) -> Callable[..., Tuple[Any, Any]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Return the VVAG function of ``f``. The inputs for ``f`` is (args[0], args[1], args[2], ...),
and the output of ``f`` is a scalar. Suppose VVAG(f) is a function with inputs in the form
(vargs[0], args[1], args[2], ...), where vagrs[0] has one extra dimension than args[0] in the first axis
and consistent with args[0] in shape for remaining dimensions, i.e. shape(vargs[0]) = [batch] + shape(args[0]).
(We only cover cases where ``vectorized_argnums`` defaults to 0 here for demonstration).
VVAG(f) returns a tuple as a value tensor with shape [batch, 1] and a gradient tuple with shape:
([batch]+shape(args[argnum]) for argnum in argnums). The gradient for argnums=k is defined as
.. math::
g^k = \frac{\partial \sum_{i\in batch} f(vargs[0][i], args[1], ...)}{\partial args[k]}
Therefore, if argnums=0, the gradient is reduced to
.. math::
g^0_i = \frac{\partial f(vargs[0][i])}{\partial vargs[0][i]}
, which is specifically suitable for batched VQE optimization, where args[0] is the circuit parameters.
And if argnums=1, the gradient is like
.. math::
g^1_i = \frac{\partial \sum_j f(vargs[0][j], args[1])}{\partial args[1][i]}
, which is suitable for quantum machine learning scenarios, where ``f`` is the loss function,
args[0] corresponds to the input data and args[1] corresponds to the weights in the QML model.
:param f: [description]
:type f: Callable[..., Any]
:param argnums: [description], defaults to 0
:type argnums: Union[int, Sequence[int]], optional
:param vectorized_argnums: the args to be vectorized, these arguments should share the same batch shape
in the fist dimension
:type vectorized_argnums: Union[int, Sequence[int]], defaults to 0
:return: [description]
:rtype: Callable[..., Tuple[Any, Any]]
Help on method vjp in module tensorcircuit.backends.tensorflow_backend:
vjp(f: Callable[..., Any], inputs: Union[Any, Sequence[Any]], v: Union[Any, Sequence[Any]]) -> Tuple[Union[Any, Sequence[Any]], Union[Any, Sequence[Any]]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Function that computes the dot product between a vector v and the Jacobian
of the given function at the point given by the inputs. (reverse mode AD relevant)
Strictly speaking, this function is value_and_vjp.
:param f: the function to carry out vjp calculation
:type f: Callable[..., Any]
:param inputs: input for ``f``
:type inputs: Union[Tensor, Sequence[Tensor]]
:param v: value vector or gradient from downstream in reverse mode AD
the same shape as return of function ``f``
:type v: Union[Tensor, Sequence[Tensor]]
:return: (``f(*inputs)``, vjp_tensor), where vjp_tensor is the same shape as inputs
:rtype: Tuple[Union[Tensor, Sequence[Tensor]], Union[Tensor, Sequence[Tensor]]]
Help on method jvp in module tensorcircuit.backends.tensorflow_backend:
jvp(f: Callable[..., Any], inputs: Union[Any, Sequence[Any]], v: Union[Any, Sequence[Any]]) -> Tuple[Union[Any, Sequence[Any]], Union[Any, Sequence[Any]]] method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Function that computes a (forward-mode) Jacobian-vector product of ``f``.
Strictly speaking, this function is value_and_jvp.
:param f: The function to compute jvp
:type f: Callable[..., Any]
:param inputs: input for ``f``
:type inputs: Union[Tensor, Sequence[Tensor]]
:param v: tangents
:type v: Union[Tensor, Sequence[Tensor]]
:return: (``f(*inputs)``, jvp_tensor), where jvp_tensor is the same shape as the output of ``f``
:rtype: Tuple[Union[Tensor, Sequence[Tensor]], Union[Tensor, Sequence[Tensor]]]
Help on method jacfwd in module tensorcircuit.backends.abstract_backend:
jacfwd(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Compute the Jacobian of ``f`` using the forward mode AD.
:param f: the function whose Jacobian is required
:type f: Callable[..., Any]
:param argnums: the position of the arg as Jacobian input, defaults to 0
:type argnums: Union[int, Sequence[int]], optional
:return: outer tuple for input args, inner tuple for outputs
:rtype: Tensor
Help on method jacrev in module tensorcircuit.backends.abstract_backend:
jacrev(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Compute the Jacobian of ``f`` using reverse mode AD.
:param f: The function whose Jacobian is required
:type f: Callable[..., Any]
:param argnums: the position of the arg as Jacobian input, defaults to 0
:type argnums: Union[int, Sequence[int]], optional
:return: outer tuple for output, inner tuple for input args
:rtype: Tensor
Help on method stop_gradient in module tensorcircuit.backends.tensorflow_backend:
stop_gradient(a: Any) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
Stop backpropagation from ``a``.
:param a: [description]
:type a: Tensor
:return: [description]
:rtype: Tensor
Help on method hessian in module tensorcircuit.backends.abstract_backend:
hessian(f: Callable[..., Any], argnums: Union[int, Sequence[int]] = 0) -> Any method of tensorcircuit.backends.tensorflow_backend.TensorFlowBackend instance
前向 AD#
使用雅可比向量积(jvp
),我们可以计算前向 AD 模式下的电路梯度,这在输出元素的数量远大于输入的情况下更合适。
假设我们要计算 \(\partial \vert \psi(\theta) \rangle\),其中 \(\psi(\theta) = U(\theta)\vert 0\rangle\) 是某个参数化量子电路的输出状态。
[4]:
def ansatz(thetas):
c = tc.Circuit(n)
for j in range(nlayers):
for i in range(n):
c.rx(i, theta=thetas[j])
for i in range(n - 1):
c.cnot(i, i + 1)
return c
def psi(thetas):
c = ansatz(thetas)
return c.state()
[5]:
state, partial_psi_partial_theta0 = K.jvp(
psi,
K.implicit_randn([nlayers]),
tc.array_to_tensor(np.array([1.0, 0, 0]), dtype="float32"),
)
因此我们得到 \(\frac{\partial \psi}{\partial \theta_0}\),因为切向量在第一个位置取 1,在其他位置取 0。
[6]:
state.shape, partial_psi_partial_theta0.shape
[6]:
(TensorShape([64]), TensorShape([64]))
Jacobian#
我们可以使用 vmap 和反向模式或前向模式 AD 逐行或逐列计算 Jacobian 行列式。
我们仍然使用上面的例子,来计算 Jacobian \(J_{ij}=\frac{\partial \psi_i}{\partial \theta_j}\)。
[7]:
thetas = K.implicit_randn([nlayers])
jac_fun = K.jit(K.jacfwd(psi))
jac_value = jac_fun(thetas)
[8]:
%timeit jac_fun(thetas)
601 µs ± 36.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[9]:
jac_value.shape
[9]:
TensorShape([64, 3])
我们也可以使用反向模式 AD 来获得 Jacobian。
[10]:
jac_fun2 = K.jit(K.jacrev(psi))
jac_value2 = jac_fun2(thetas)
[11]:
%timeit jac_fun2(thetas)
843 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[12]:
jac_value2.shape
[12]:
TensorShape([64, 3])
[13]:
np.testing.assert_allclose(np.real(jac_value), jac_value2, atol=1e-5)
值得注意的是,前向模式 AD Jacobian 更快,因为结果 Jacobian 是一个高矩阵。
量子费舍尔信息#
量子Fisher信息是量子计算中一个非常重要的量,可用于所谓的量子自然梯度下降优化以及变分量子动力学。 有关详细信息,请参阅 参考论文。
类 QFI 的对象有多种变体,要评估的核心都与 \(\langle \partial \psi \vert \partial \psi\rangle - \langle \partial \psi \vert \psi\rangle\langle \psi \vert \partial \psi\rangle\) 有关。 使用高级 AD 框架很容易获得这样的数量,方法是首先获取状态的 Jacobian,然后在 Jacobian 行上进行 vmap 内积。 详细的实现可以在代码库 tensorcircuit/experimental.py
中找到。我们在本笔记中直接调用对应的 API。
[14]:
from tensorcircuit.experimental import qng
[15]:
qfi_fun = K.jit(qng(psi))
qfi_value = qfi_fun(thetas)
qfi_value.shape
WARNING:tensorflow:The dtype of the watched primal must be floating (e.g. tf.float32), got tf.complex64
[15]:
TensorShape([3, 3])
[16]:
%timeit qfi_fun(thetas) # 速度与简单的 Jacobian 计算相当
609 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Hessian#
Hessian 定义为 \(\partial_{ij} \langle \psi(\theta)\vert H\vert \psi(\theta)\rangle\),其中 \(ij\) 是 \(\theta_i\theta_j\) 的简写。
在以下示例中,为简单起见,我们使用 \(H=Z_0\)。
[17]:
def h(thetas):
c = ansatz(thetas)
return K.real(c.expectation_ps(z=[0]))
hess_f = K.jit(K.hessian(h))
[18]:
hess_value = hess_f(thetas)
hess_value.shape
[18]:
TensorShape([3, 3])
\(\langle \psi \vert H \vert \partial \psi \rangle\)#
这个量作为变分量子动力学方程的 RHS 非常常见。 除了构建相应的 Hadamard 测试电路外,没有很好的方法来计算这个量。
但是,只要存在 stop_gradint
API,我们就可以在 AD 框架中轻松获取,TensorCircuit 就是这种情况。 即这个量得到 \(\partial (\langle \psi \vert H\vert \bot(\psi)\rangle)\),其中外部 \(\partial\) 由 AD 自动实现, \(\bot\) 为停止反向传播的 stop_gradient
操作。
[19]:
z0 = tc.quantum.PauliStringSum2Dense([[3, 0, 0, 0, 0, 0]])
def h(thetas):
w = psi(thetas)
wr = K.stop_gradient(w)
wl = K.conj(w)
wl = K.reshape(wl, [1, -1])
wr = K.reshape(wr, [-1, 1])
e = wl @ z0 @ wr
return K.real(e)[0, 0]
[20]:
psi_h_partial_psi = K.grad(h)(thetas)
psi_h_partial_psi.shape
[20]:
TensorShape([3])