Adding Hamiltonian Terms Takes Super Long

Edit: I did try qubit tapering, but that barely reduced the Hamiltonian:

The next thing I tried was adding using qml.Hamiltonian() using the SumWithQMLHamiltonian(obs_to_sum, coeffs_for_sum) from this question asked on the forum, but it still takes pretty long.

Now what I’m doing is using another function from that same post:

def SimplePythonSum(obs_to_sum, coeffs_for_sum):
    # Check that there is one multiplication factor for each Hamiltonian and one Hamiltonian for each multiplication factor.
    assert(len(obs_to_sum) == len(coeffs_for_sum))

    # Sum up all Hamiltonians in obst_to_sum times their respective coefficient from coeffs_for_sum.
    cost_h = sum([coeffs_for_sum[i]*obs_to_sum[i] for i in range(len(obs_to_sum))])
    # Return summed Hamiltonian.
    return cost_h

but trying to replace sum with cupy.sum to attempt to add two large Hamiltonians using the GPU-accelerated cupy library. However, when I try to convert this to a cupy.array, it says the type is unsupported:

Would any Pennylane expert know how to use Cupy to accelerate adding two Hamiltonians?, or any other suggestions would be amazing.Thanks!