Gradient and Variational Optimization#

Overview#

TensorCircuit is designed to make optimization of parameterized quantum gates easy, fast, and convenient. In this note, we review how to obtain circuit gradients and run variational optimization.

Setup#

[1]:
import numpy as np
import scipy.optimize as optimize
import tensorflow as tf
import tensorcircuit as tc

K = tc.set_backend("tensorflow")

PQC#

Consider a variational circuit acting on \(n\) qubits, and consisting of \(k\) layers, where each layer comprises parameterized \(e^{i\theta X\otimes X}\) gates between neighboring qubits followed by a sequence of single qubit parameterized \(Z\) and \(X\) rotations. We now show how to implement such circuits in TensorCircuit, and how to use one of the machine learning backends to compute cost functions and gradients easily and efficiently.

The circuit for general \(n,k\) and set of parameters can be defined as follows:

[2]:
def qcircuit(n, k, params):
    c = tc.Circuit(n)
    for j in range(k):
        for i in range(n - 1):
            c.exp1(
                i, i + 1, theta=params[j * (3 * n - 1) + i], unitary=tc.gates._xx_matrix
            )
        for i in range(n):
            c.rz(i, theta=params[j * (3 * n - 1) + n - 1 + i])
            c.rx(i, theta=params[j * (3 * n - 1) + 2 * n - 1 + i])
    return c

As an example, we take \(n=3, k=2\), set TensorFlow as our backend, and define an energy cost function to minimize

\[E = \langle X_0 X_1\rangle_\theta + \langle X_1 X_2\rangle_\theta.\]
[3]:
n = 3
k = 2


def energy(params):
    c = qcircuit(n, k, params)
    e = c.expectation_ps(x=[0, 1]) + c.expectation_ps(x=[1, 2])
    return K.real(e)

Grad and JIT#

Using the ML backend support for automatic differentiation, we can now quickly compute both the energy and the gradient of the energy with respect to the parameters.

[4]:
energy_val_grad = K.value_and_grad(energy)

This creates a function that given a set of parameters as input, returns both the energy and the gradient of the energy. If only the gradient is desired, then this can be computed by K.grad(energy). While we could run the above code directly on a set of parameters, if multiple evaluations of the energy will be performed, significant time savings can be had by using a just-in-time compiled version of the function.

[5]:
energy_val_grad_jit = K.jit(energy_val_grad)

With K.jit, the initial evaluation of the energy and gradient may take longer, but subsequent evaluations will be noticeably faster than non-jitted code. We recommend always using jit as long as the function is “tensor-in, tensor-out”, and we have worked hard to make all aspects of the circuit simulator compatible with JIT.

Optimization via ML Backend#

With the energy function and gradients available, optimization of the parameters is straightforward. Below is an example of how to do this via stochastic gradient descent.

[6]:
learning_rate = 2e-2
opt = K.optimizer(tf.keras.optimizers.SGD(learning_rate))


def grad_descent(params, i):
    val, grad = energy_val_grad_jit(params)
    params = opt.update(grad, params)
    if i % 10 == 0:
        print(f"i={i}, energy={val}")
    return params


params = K.implicit_randn(k * (3 * n - 1))
for i in range(100):
    params = grad_descent(params, i)
i=0, energy=0.11897378414869308
i=10, energy=-0.3692811131477356
i=20, energy=-0.7194114923477173
i=30, energy=-0.904697597026825
i=40, energy=-1.013866662979126
i=50, energy=-1.1042678356170654
i=60, energy=-1.1998062133789062
i=70, energy=-1.308410406112671
i=80, energy=-1.4276418685913086
i=90, energy=-1.5474387407302856

Optimization via Scipy Interface#

An alternative to using the machine learning backends for the optimization is to use SciPy. This can be done via the scipy_interface API call and allows for gradient-based (e.g. BFGS) and non-gradient-based (e.g. COBYLA) optimizers to be used, which are not available via the ML backends.

[7]:
f_scipy = tc.interfaces.scipy_interface(energy, shape=[k * (3 * n - 1)], jit=True)
params = K.implicit_randn(k * (3 * n - 1))
r = optimize.minimize(f_scipy, params, method="L-BFGS-B", jac=True)
r
/Users/shixin/Cloud/newwork/quantum-information/codebases/tensorcircuit/tensorcircuit/interfaces.py:237: ComplexWarning: Casting complex values to real discards the imaginary part
  scipy_gs = scipy_gs.astype(np.float64)
[7]:
      fun: -2.000000476837158
 hess_inv: <16x16 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 2.43186951e-04, -1.50322914e-04,  8.94665718e-05,  1.18807920e-05,
        2.95639038e-05,  1.19209290e-07, -5.96046448e-08, -2.98023224e-08,
        0.00000000e+00, -1.19209290e-07,  3.90738450e-07,  9.34305717e-07,
       -8.22039729e-05,  1.19209290e-07,  0.00000000e+00,  0.00000000e+00])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 60
      nit: 19
     njev: 60
   status: 0
  success: True
        x: array([ 2.35625520e+00,  7.85409154e-01,  1.57088576e+00,  2.10625989e-05,
       -1.57088425e+00, -1.70256902e+00, -5.33743572e-01,  3.11436816e-01,
        1.26543793e+00,  1.91663337e+00, -1.15901008e-07, -1.76623396e-05,
       -1.59972887e-04, -8.97072367e-01,  1.79929630e+00, -9.67278961e-01])

The first line above specifies the shape of the parameters to be supplied to the function to be minimized, which here is the energy function. The jit=True argument automatically takes care of jitting the energy function. Gradient-free optimization can similarly be performed efficiently by supplying the gradient=False argument to scipy_interface.

[8]:
f_scipy = tc.interfaces.scipy_interface(
    energy, shape=[k * (3 * n - 1)], jit=True, gradient=False
)
params = K.implicit_randn(k * (3 * n - 1))
r = optimize.minimize(f_scipy, params, method="COBYLA")
r
[8]:
     fun: -1.9999911785125732
   maxcv: 0.0
 message: 'Optimization terminated successfully.'
    nfev: 386
  status: 1
 success: True
       x: array([ 7.87597857e-01, -5.14158452e-01, -1.56560250e+00, -3.15230777e-04,
        9.91532990e-01,  5.95588091e-01,  1.38523058e+00, -3.59642968e-04,
       -3.23365306e-01, -4.16465772e-01, -7.32259085e-03,  6.53997758e-05,
        7.71203778e-01,  2.46256921e+00,  8.78602039e-01, -3.51989842e-01])