Pennylane-qiskit connection error

I am running the Multiclass margin classifier project from Pennylane. I knew that Pennylane-qiskit plugin in Windows 10 environment doesn’t work, so I tried to use Windows subsystem Linux. I was able to pip install all the necessary packages like pennylane-qiskit and the code was able to start running. However, it took several hours and thousands of items on IBMQ backend to finish only two iterations, and gave me errors as follows:

WARNING:urllib3.connectionpool:Retrying (PostForcelistRetry(total=4, connect=3, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /api/Network/ibm-q/Groups/open/Projects/main/Jobs/5f5f1824ab585b001a542449/v/1
Iter:     1 | Cost: 0.3086328 | Acc train: 0.3482143 | Acc test: 0.3947368 
WARNING:urllib3.connectionpool:Retrying (PostForcelistRetry(total=4, connect=3, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /api/Network/ibm-q/Groups/open/Projects/main/Jobs/5f5f26d8ab585b001a54257f/v/1
Iter:     2 | Cost: 0.1968828 | Acc train: 0.3303571 | Acc test: 0.368421

I already asked in Qiskit Forum, and they said it doesn’t seem to be Qiskit’s error and I should ask in Pennylane Forum. I also tried to run on a real Linux machine and it has the same issue. Maybe its because the algorithm is too much for the remote quantum computer? Any help would be appreciated! Thanks!

Hi @Ziqi_Wang!

That is odd—do you have a minimal code example that produces that same error that you could share here?

Thanks for your response!
The code is literally the same as the tutorial called " Multiclass margin classifier" except that I tried to use the quantum computer from IBM Quantum Experience instead of the default pennylane simulator. The link to the tutorial is as follows:

And the source code is:

import pennylane as qml
import torch
from pennylane import numpy as np
from torch.autograd import Variable
import torch.optim as optim
from qiskit import QuantumRegister

num_classes = 3
margin = 0.15
feature_size = 4
batch_size = 10
lr_adam = 0.01
train_split = 0.75
num_qubits = 2
num_layers = 6
total_iterations = 100
#HERE is the only difference that I used ibmq backend
dev = qml.device('qiskit.ibmq', wires=num_qubits, backend="ibmq_qasm_simulator", ibmqx_token='XXX')

def layer(W):
    qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
    qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
    qml.CNOT(wires=[0, 1])

def circuit(weights, feat=None):
    qml.templates.embeddings.AmplitudeEmbedding(feat, [0, 1], pad=0.0, normalize=True)
    for W in weights:
    return qml.expval(qml.PauliZ(0))

qnode1 = qml.QNode(circuit, dev).to_torch()
qnode2 = qml.QNode(circuit, dev).to_torch()
qnode3 = qml.QNode(circuit, dev).to_torch()

def variational_classifier(q_circuit, params, feat):
    weights = params[0]
    bias = params[1]
    return q_circuit(weights, feat=feat) + bias

def multiclass_svm_loss(q_circuits, all_params, feature_vecs, true_labels):
    loss = 0
    num_samples = len(true_labels)
    for i, feature_vec in enumerate(feature_vecs):

        s_true = variational_classifier(
            (all_params[0][int(true_labels[i])], all_params[1][int(true_labels[i])]),
        s_true = s_true.float()
        li = 0

        for j in range(num_classes):
            if j != int(true_labels[i]):
                s_j = variational_classifier(
                    q_circuits[j], (all_params[0][j], all_params[1][j]), feature_vec
                s_j = s_j.float()
                li += torch.max(torch.zeros(1).float(), s_j - s_true + margin)
        loss += li

    return loss / num_samples

def classify(q_circuits, all_params, feature_vecs, labels):
    predicted_labels = []
    for i, feature_vec in enumerate(feature_vecs):
        scores = [0, 0, 0]
        for c in range(num_classes):
            score = variational_classifier(
                q_circuits[c], (all_params[0][c], all_params[1][c]), feature_vec
            scores[c] = float(score)
        pred_class = np.argmax(scores)
    return predicted_labels

def accuracy(labels, hard_predictions):
    loss = 0
    for l, p in zip(labels, hard_predictions):
        if torch.abs(l - p) < 1e-5:
            loss = loss + 1
    loss = loss / labels.shape[0]
    return loss

def load_and_process_data():
    data = np.loadtxt("iris.csv", delimiter=",")
    X = torch.tensor(data[:, 0:feature_size])
    print("First X sample (original)  :", X[0])

    normalization = torch.sqrt(torch.sum(X ** 2, dim=1))
    X_norm = X / normalization.reshape(len(X), 1)
    print("First X sample (normalized):", X_norm[0])

    Y = torch.tensor(data[:, -1])
    return X, Y

def split_data(feature_vecs, Y):
    num_data = len(Y)
    num_train = int(train_split * num_data)
    index = np.random.permutation(range(num_data))
    feat_vecs_train = feature_vecs[index[:num_train]]
    Y_train = Y[index[:num_train]]
    feat_vecs_test = feature_vecs[index[num_train:]]
    Y_test = Y[index[num_train:]]
    return feat_vecs_train, feat_vecs_test, Y_train, Y_test

def training(features, Y):
    num_data = Y.shape[0]
    feat_vecs_train, feat_vecs_test, Y_train, Y_test = split_data(features, Y)
    num_train = Y_train.shape[0]
    q_circuits = [qnode1, qnode2, qnode3]
    all_weights = [
        Variable(0.1 * torch.randn(num_layers, num_qubits, 3), requires_grad=True)
        for i in range(num_classes)
    all_bias = [Variable(0.1 * torch.ones(1), requires_grad=True) for i in range(num_classes)]
    optimizer = optim.Adam(all_weights + all_bias, lr=lr_adam)
    params = (all_weights, all_bias)
    print("Num params: ", 3 * num_layers * num_qubits * 3 + 3)

    costs, train_acc, test_acc = [], [], []

    for it in range(total_iterations):
        batch_index = np.random.randint(0, num_train, (batch_size,))
        feat_vecs_train_batch = feat_vecs_train[batch_index]
        Y_train_batch = Y_train[batch_index]

        curr_cost = multiclass_svm_loss(q_circuits, params, feat_vecs_train_batch, Y_train_batch)
        predictions_train = classify(q_circuits, params, feat_vecs_train, Y_train)
        predictions_test = classify(q_circuits, params, feat_vecs_test, Y_test)
        acc_train = accuracy(Y_train, predictions_train)
        acc_test = accuracy(Y_test, predictions_test)
            "Iter: {:5d} | Cost: {:0.7f} | Acc train: {:0.7f} | Acc test: {:0.7f} "
            "".format(it + 1, curr_cost.item(), acc_train, acc_test)


    return costs, train_acc, test_acc

features, Y = load_and_process_data()
costs, train_acc, test_acc = training(features, Y)
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots()
iters = np.arange(0, total_iterations, 1)
colors = ["tab:red", "tab:blue"]
ax1.set_xlabel("Iteration", fontsize=17)
ax1.set_ylabel("Cost", fontsize=17, color=colors[0])
ax1.plot(iters, costs, color=colors[0], linewidth=4)
ax1.tick_params(axis="y", labelsize=14, labelcolor=colors[0])
ax2 = ax1.twinx()
ax2.set_ylabel("Test Acc.", fontsize=17, color=colors[1])
ax2.plot(iters, test_acc, color=colors[1], linewidth=4)
ax2.tick_params(axis="x", labelsize=14)
ax2.tick_params(axis="y", labelsize=14, labelcolor=colors[1])

Hi @Ziqi_Wang!

Before answering your question - I just wanted to point out that you shared your IBM token in the above code. You should probably edit your post to remove the token and also go onto the IBM Quantum Experience website and generate a new token. It’s an easy mistake to make, but best not to share the token publicly because others could use it.

It’s great that you managed to get the code working using WSL! Unfortunately I’d say your experience with wait times using remote backends is expected. When running on a local simulator, you are able to evaluate the circuit very quickly, but when using a remote backend you need to send the job, wait for it to go through the queue, wait for it to simulate, and wait for it to come back to you. This can take a single evaluation from fractions of a second to ~1 second. Moreover, the delay will become increasingly noticeable for deep circuits, since calculation of the gradient requires 2 * N_param circuit evalutions.

Some ways you could mitigate this are:

  • Book time on a reservable device, saving you from waiting in the queue
  • Use a shallower circuit (potentially you could use more qubits, i.e., greater width, to compensate)

You could also check out this post for more of a discussion:


Thank you so much for your response!
I didn’t even think about the token when I added the sample code, but I will definitely remember not to include my token next time.
It does make sense as the remote backend is usually occupied by others and there are 111 parameters which make this demo’s calculation rather complicated.
I will look into the advice, thanks again

@Ziqi_Wang, something to note is that the parameter-shift rule for analytic quantum gradients performs 2p quantum evaluations per optimization step, where p is the number of parameters.

So for 111 parameters, that means 222 jobs are being submitted per optimization step!