Hi @QuantumMan, welcome to our community!
This is an excellent question!
The question of data transfer is critical in order to understand how to use quantum computers better.
As you probably have noticed already, you cannot just “load” your data into a quantum computer the way you do for a classical one. What you do is that you embed your data into the computer by using gates. In the Templates section of the PennyLane documentation you will see that you can embed your data in different ways. Probably the easiest one to understand and use is AngleEmbedding. Using this embedding means using rotation gates in your circuit, and the rotation angle is determined by your data. So if your input data is [1,2,3], then a rotation of 1, 2, and 3 radians respectively will be applied. You can specify whether you want to use X, Y or Z rotations as shown in the docs. If your data doesn’t really fall within the [0-2pi] range then you can rescale it classically before doing the embedding.
Aside from the data that you may want to embed you also have trainable parameters. These may be rotation angles, control values and more. These parameters will determine the shape and behaviour of your circuit.
Looking at the lower-level, the actual implementation on hardware will be determined by the technology being used. Your circuit is built according to the parameters that you have provided so if you’re implementing this on a superconducting circuit for instance this will determine the length of the microwave pulse that determines the physical implementation of the gate. For other technologies the physical implementation may be given by a voltage being applied on a chip, a laser beam, and so many other physical implementations for different kinds of hardware.
From a purely algorithmic perspective you probably don’t need to worry about the physical implementation of the gates, the important thing is that your data and parameters define a specific circuit, and once you have defined that circuit you can run it on actual quantum hardware (or simulators).
Finally, after the circuit runs you need to perform a measurement, which will return a classical value. Once you measure, everything quantum in your circuit collapses and you have classical data that you can work with like you’re used to in your classical computer and with your classical optimizer. For instance if you measured a sample you get one of 2 values and then you can work with that. Or if you measure an expectation value this represents running your circuit many times and getting an average of your measurement so you get continuous and differentiable data that you can use to plug into your optimizer.
After all of this you can use your optimizer to determine the new set of parameters that you’re going to use in your circuit and you start all over again. This is why we talk about variational (or parametrized) circuits.
How large can the data be? Well, having large datasets means that you will need a lot of gates and/or qubits in order to embed all of your data and/or parameters. The deeper and larger that your circuit is the more prone it is to giving you the wrong results, and the harder it becomes to simulate. This means that if we had big and low-error quantum computers we could run circuits that we cannot simulate. However the current hardware available is still small and noisy so you’re probably better-off using simulators still.
As a rule of thumb you can probably run up to 20 qubits on a laptop, up to 28 qubits on a GPU, and up to 100 qubits in very specific cases and with advanced tools (which is probably not your case). The depth of the circuit will also be important but I can’t really give you a number of how deep they can be.
So if you want to learn more about all of this I recommend that you go through the Xanadu Quantum Codebook which will teach you all that you need to know in order to understand quantum programming.
And if you want to see some examples of quantum machine learning with different datasets take a look at our QML demos, the community demos made by different members of our community, and our other demos on various topics.
I know this was a long answer for your question but I hope it gives you some insight into the complexity of your question. The short answer is, for now you’re mostly limited to small datasets because otherwise your circuits become impossible to simulate and/or run on actual hardware. How small is very dependent on your data and exactly what you’re doing with it.
Please let me know if this helps and if you have any other questions!