结合SKLearn实现的支持向量分类#

Mark (Zixuan) Song 撰写

本示例结合了sklearn库中的SVC类,实现了支持向量分类。

概述#

本示例的目的是将量子机器学习(QML)转换器嵌入到SVC管道中并且介绍tensorcircuitscikit-learn的一种连接方式。

设置#

安装 scikit-learnrequests. 本模型测试数据为 [德国信用]The data that is going to be used is German Credit Data by UCI

pip install scikit-learn requests
[1]:
import tensorcircuit as tc
import tensorflow as tf
from sklearn.svm import SVC
from sklearn import metrics
from time import time
import requests

K = tc.set_backend("tensorflow")

数据处理#

数据集包含20个变量,每个变量都是整数值。为了使模型能够使用数据,我们需要将数据归一化为0到1之间。

[2]:
def load_GCN_data():
    link2gcn = "http://home.cse.ust.hk/~qyang/221/Assignments/German/GermanData.csv"
    data = requests.get(link2gcn)
    data = data.text
    data = data.split("\n")[:-1]
    x = None
    y = None

    def destring(string):
        string = string.split(",")
        return_array = []
        for i, v in enumerate(string):
            if v[0] == "A":
                return_array.append(int(v[1 + len(str(i)) :]))
            else:
                return_array.append(int(v))
        return K.cast([return_array[:-1]], dtype="float32"), K.cast(
            [return_array[-1] - 1], dtype="int32"
        )

    for i in data:
        if x is None:
            temp_x, temp_y = destring(i)
            x = K.cast(temp_x, dtype="float32")
            y = K.cast(temp_y, dtype="int32")
        else:
            temp_x, temp_y = destring(i)
            x = K.concat([x, temp_x], axis=0)
            y = K.concat([y, temp_y], axis=0)
    x = K.transpose(x)
    nx = None
    for i in x:
        max_i = K.cast(K.max(i), dtype="float32")
        temp_nx = [K.divide(i, max_i)]
        nx = K.concat([nx, temp_nx], axis=0) if nx is not None else temp_nx
    x = K.transpose(nx)
    return (x[:800], y[:800]), (x[800:], y[800:])


(x_train, y_train), (x_test, y_test) = load_GCN_data()

量子模型#

这个量子模型是输入为1x20的矩阵,并输出为5个量子比特的状态。模型如下所示:

[3]:
def quantumTran(inputs):
    c = tc.Circuit(5)
    for i in range(4):
        if i % 2 == 0:
            for j in range(5):
                c.rx(j, theta=(0 if i * 5 + j >= 20 else inputs[i * 5 + j]))
        else:
            for j in range(5):
                c.rz(j, theta=(0 if i * 5 + j >= 20 else inputs[i * 5 + j]))
            for j in range(4):
                c.cnot(j, j + 1)
    return c.state()


func_qt = tc.interfaces.tensorflow_interface(quantumTran, ydtype=tf.complex64, jit=True)

将量子模型打包成SVC#

将量子模型打包成SKLearn能使用的SVC模型。

[4]:
def quantum_kernel(quantumTran, data_x, data_y):
    def kernel(x, y):
        x = K.convert_to_tensor(x)
        y = K.convert_to_tensor(y)
        x_qt = None
        for i, x1 in enumerate(x):
            if i == 0:
                x_qt = K.convert_to_tensor([quantumTran(x1)])
            else:
                x_qt = K.concat([x_qt, [quantumTran(x1)]], 0)
        y_qt = None
        for i, x1 in enumerate(y):
            if i == 0:
                y_qt = K.convert_to_tensor([quantumTran(x1)])
            else:
                y_qt = K.concat([y_qt, [quantumTran(x1)]], 0)
        data_ret = K.cast(K.power(K.abs(x_qt @ K.transpose(y_qt)), 2), "float32")
        return data_ret

    clf = SVC(kernel=kernel)
    clf.fit(data_x, data_y)
    return clf

创建传统SVC模型#

[5]:
def standard_kernel(data_x, data_y, method):
    methods = ["linear", "poly", "rbf", "sigmoid"]
    if method not in methods:
        raise ValueError("method must be one of %r." % methods)
    clf = SVC(kernel=method)
    clf.fit(data_x, data_y)
    return clf

测试对比#

测试量子SVC模型并于传统SVC模型进行对比。

[6]:
methods = ["linear", "poly", "rbf", "sigmoid"]

for method in methods:
    print()
    t = time()

    k = standard_kernel(data_x=x_train, data_y=y_train, method=method)
    y_pred = k.predict(x_test)
    print("Accuracy:(%s as kernel)" % method, metrics.accuracy_score(y_test, y_pred))

    print("time:", time() - t, "seconds")

print()
t = time()

k = quantum_kernel(quantumTran=func_qt, data_x=x_train, data_y=y_train)
y_pred = k.predict(x_test)
print("Accuracy:(qml as kernel)", metrics.accuracy_score(y_test, y_pred))

print("time:", time() - t, "seconds")

Accuracy:(linear as kernel) 0.78
time: 0.00810384750366211 seconds

Accuracy:(poly as kernel) 0.75
time: 0.024804115295410156 seconds

Accuracy:(rbf as kernel) 0.765
time: 0.011444091796875 seconds

Accuracy:(sigmoid as kernel) 0.695
time: 0.010396003723144531 seconds

Accuracy:(qml as kernel) 0.66
time: 6.472219228744507 seconds

SKLearn的局限性#

因为SKLearn的局限性,SKLearnSVC并不完全兼容量子机器学习(QML)。

这是因为QML输出的为复数(布洛赫球上的坐标),而SKLearn只接受浮点数。这导致QML输出的结果必须在使用SVC之前转换为浮点数,从而可能导致精度损失。

结论#

由于SKLearn的局限性,量子SVC在准确性和速度上都不如传统SVC。但是,如果这种局限性被消除,量子SVC可能会在准确性上都优于传统SVC。