Customized Contraction#

Overview#

If the simulated circuit has large qubit counts, we recommend users try a customized contraction setup instead of the default one, which is greedy.

Setup#

Please refer to the installation documentation for cotengra, which cannot simply be obtained by pip install since it is not uploaded to PyPI. The easiest way for installation is pip install -U git+https://github.com/jcmgray/cotengra.git.

[3]:
import tensorcircuit as tc
import numpy as np
import cotengra as ctg

We use the following example as a testbed for the contraction, the real contraction is invoked for Circuit.expectation API, and there are two stages for the contraction. The first one is contraction path searching which is used to find a better contraction path in terms of space and time. The second stage is the real contraction, where matrix multiplication is called using ML backend API. In this note, we focus on the performance of the first stage. And the contraction path solver can be customized with any type of opt-einsum compatible path solver.

[4]:
def testbed():
    n = 40
    d = 6
    param = K.ones([2 * d, n])
    c = tc.Circuit(n)
    c = tc.templates.blocks.example_block(c, param, nlayers=d, is_split=True)
    # the two-qubit gate is split and truncated with SVD decomposition
    return c.expectation_ps(z=[n // 2], reuse=False)

There are several contractor optimizers provided by opt-einsum and shipped with the TensorNetwork package. Since TensorCircuit is built on top of TensorNetwork, we can use these simple contractor optimizers. Though for any moderate system, only a greedy optimizer works, other optimizers come with exponential scaling and fail in circuit simulation scenarios.

We always set contraction_info=True (default is False) for the contractor system in this note, which will print contraction information summary including contraction size, flops, and write. For the definition of these metrics, also refer to cotengra docs and the corresponding paper.

Metrics that measure the quality of a contraction path include

  • FLOPs: the total number of computational operations required for all matrix multiplications involved when contracting the tensor network via the given path. This metric characterizes the total simulation time.

  • WRITE: the total size (the number of elements) of all tensors – including intermediate tensors – computed during the contraction.

  • SIZE: the size of the largest intermediate tensor stored in memory.

Since simulations in TensorCircuit are AD-enabled, where all intermediate results need to be cached and traced, the more relevant spatial cost metric is writes instead of size.

Also, we will enable debug_level=2 in set_contractor (never use this option in real computation!) By enabling this, the second stage of the contraction, i.e. the real contraction, will not happen. We can focus on the contraction path information, which demonstrates the difference between different customized contractors.

[5]:
tc.set_contractor("greedy", debug_level=2, contraction_info=True)
# the default contractor
testbed()
------ contraction cost summary ------
log10[FLOPs]:  12.393  log2[SIZE]:  30  log2[WRITE]:  35.125
[5]:
<tf.Tensor: shape=(), dtype=complex64, numpy=0j>

cotengra optimizer: for hyperparameters tuning, see the documentation.

[7]:
opt = ctg.ReusableHyperOptimizer(
    methods=["greedy", "kahypar"],
    parallel=True,
    minimize="write",
    max_time=120,
    max_repeats=1024,
    progbar=True,
)
# Caution: for now, parallel only works for "ray" in newer versions of python
tc.set_contractor(
    "custom", optimizer=opt, preprocessing=True, contraction_info=True, debug_level=2
)
# the opt-einsum compatible function interface is passed as the argument of optimizer\
# Also note how preprocessing=True merges the single qubits gate into the neighbor two-qubit gate
testbed()
log2[SIZE]: 15.00 log10[FLOPs]: 7.56:  45%|██████████████████▊                       | 458/1024 [02:03<02:32,  3.70it/s]
------ contraction cost summary ------
log10[FLOPs]:  7.565  log2[SIZE]:  15  log2[WRITE]:  19.192
[7]:
<tf.Tensor: shape=(), dtype=complex64, numpy=0j>

We can even include contraction reconfigure after path searching, which further greatly boosts the space efficiency for the contraction path.

[8]:
opt = ctg.ReusableHyperOptimizer(
    minimize="combo",
    max_repeats=1024,
    max_time=120,
    progbar=True,
)


def opt_reconf(inputs, output, size, **kws):
    tree = opt.search(inputs, output, size)
    tree_r = tree.subtree_reconfigure_forest(
        progbar=True, num_trees=10, num_restarts=20, subtree_weight_what=("size",)
    )
    return tree_r.get_path()


# there is also a default parallel=True option for subtree_reconfigure_forest,
# this can only be set as "ray" for newer version python as above
# note how different versions of cotengra have breaking APIs in the last line: get_path or path
# the user may need to change the API to make the example work

tc.set_contractor(
    "custom",
    optimizer=opt_reconf,
    contraction_info=True,
    preprocessing=True,
    debug_level=2,
)
testbed()
log2[SIZE]: 15.00 log10[FLOPs]: 7.46:  32%|█████████████▍                            | 329/1024 [02:00<04:13,  2.74it/s]
log2[SIZE]: 14.00 log10[FLOPs]: 7.02: 100%|█████████████████████████████████████████████| 20/20 [01:05<00:00,  3.30s/it]
------ contraction cost summary ------
log10[FLOPs]:  7.021  log2[SIZE]:  14  log2[WRITE]:  19.953
[8]:
<tf.Tensor: shape=(), dtype=complex64, numpy=0j>