I want to implement pattern searching to look for DNA patterns in text. I’m having difficulty implementing it. Has anyone done pattern searching before?

Hi @QuantumDev, welcome to the Forum!

I haven’t seen anyone doing this but it definitely looks interesting.

Have you taken a look at our Quantum Machine Learning demos? They’re not specific to pattern finding but they use many different techniques that you could try out.

Do you have some experience with PennyLane and/or quantum programming, or would it help if I recommended some content that can help you get started?

Thank you for your response. I am a new researcher on quantum computing, and it looks like the library is PennyLane. I can run other libraries, and many demos can I learn. But I can’t find a solution to my problem. I hope I can solve my problem of implementing quantum computing for pattern search. Thanks

Hi @QuantumDev,

I don’t know how you would do for implementing pattern matching for DNA sequences but here’s an idea:

- You can encode your nucleotides as specific quantum gates such as S and Z.
- You can then use qml.pattern_matching to identify matches between your circuit and your pattern.
- You can look into the source code for pattern_matching if you want to see what’s happening under the hood.

I hope this helps you!

let me explain the problem, pattern matching for DNA sequences is search pattern in long string. example

pattern = “ACGT”

text = “AACCGGTT**ACGT**AACC**ACGT**”

I need a solution for the search pattern in the text; in the example, the pattern “ACGT” on the text is count 2. I need quantum computing to search for examples, and the results can count text and know the location (number index of text) of the patterns in the text. one algorithm on quantum computing can search is grover search, but on grover search I can’t found the result show counting the pattern and location pattern in text. because quantum computing can solved searching with low time complexity like grover search, but i need grover search or other algorithm can search and return counting and location of pattern in text in quantum computing.

thanks for the recommendation for the solution, but the details of my problem are shown at the top.

Thank you

Hi @QuantumDev,

I don’t think you’ll find a large advantage in using quantum computers for this. Classical computers are very good at finding patterns in text. Quantum computers are actually slower than classical ones in most cases.

You could potentially tweak Grover’s algorithm to test this application by encoding your pattern as a basis state in your circuit. For example if “ACGT” is encoded as “0123” this is equivalent to “000001111011” and you can then use it as your omega in Grovers.

I’m not sure whether it will actually help you for your application but it’s an idea.

I hope this helps you.

literature regarding Grover search has lower time complexity than algorithms on classical computers. Based on these advantages, I am looking for a solution to find pattern locations in very long texts such as DNA sequencing. However, I have difficulty understanding the process of implementing and tweaking the Grover algorithm as you suggest. Maybe someone can give a simple example of how to handle a problem based on the example I gave previously, after that maybe I can apply it to a real case study

Hi @QuantumDev,

In principle you’re right but in practice classical algorithms tend to perform better for these kinds of problems due to issues like the complexity of embedding the data into the quantum circuit. If you’re interested in learning more about speedups in quantum algorithms or more about grovers I highly recommend that you check out the Xanadu Quantum Codebook, particularly module A and module G.

That being said, if you really want to implement Grovers for this you would need to follow the following steps:

- Find all subsets of 4 letters in your data which you want to evaluate.
- Encode each subset into your circuit (you can use amplitude embedding, angle embedding, or more options as found here).
- Create an Oracle which marks the right state. Eg: if you encoded your information such that your desired pattern is encoded in state 1010, then this is the state that you want to mark in your oracle.
- Continue with all of the other steps from Grover’s algorithm.

As you may notice steps 1 and 2 may require a lot of time if your dataset is long so I’m not sure that grovers will help you here.

I hope this helps you.

As Catalina has stated already classical pattern searches are superior to quantum approaches - at least for the time being.

You need large numbers of qubits and on real quantum hardware noise is your enemy. The information from e.g. amplitude encoding for longer sequences yields very small amplitudes which would be drowned out by noise.

You can also use the Swap Test. It compares 2 quantum states - identical state give you a probability of 1 and non-identical yield smaller numbers or approach zero.

I am currently experimenting with this. You can play with the toy sequence and see how this affects the probability. Even just one base reduces the probability a lot.

This is a toy model implementation:

```
def DNA_encoder_complex(seq):
'''
Complex encoding of a DNA sequence
The function returns an array of complex numbers
Copyright Berend Rah 2024 https://www.linkedin.com/in/berendrah/
https://github.com/Quvance
'''
DNA_list=list(seq)
encode_DNA={'A':'-1','T':'1','C':'-1j','G':'1j', 'R':'-1+j'} # R is an ambiguous result which could either be A or G
encoded_seq=np.array([complex(encode_DNA[base]) for base in DNA_list])
return encoded_seq
dev = qml.device("lightning.qubit", wires=range(9))
@qml.qnode(dev)
def SWAP_test(seq1,seq2,qubits):
'''
There are 2 quantum state of equal size - each one of size (qubits-1)/2
'''
enc_seq1=DNA_encoder_complex(seq1)
enc_seq2=DNA_encoder_complex(seq2)
qml.AmplitudeEmbedding(features=enc_seq1, wires=range(1,int((qubits-1)/2+1)), normalize=True, pad_with=0.)
qml.AmplitudeEmbedding(features=enc_seq2, wires=range(int((qubits-1)/2+1),qubits), normalize=True, pad_with=0.)
#qml.AmplitudeEmbedding(features=enc_seq1, wires=range(1,5), normalize=True, pad_with=0.)
#qml.AmplitudeEmbedding(features=enc_seq2, wires=range(5,9), normalize=True, pad_with=0.)
qml.Hadamard(0)
for i in range(1,int((qubits-1)/2+1)):
qml.CSWAP(wires=[0, i, i+int((qubits-1)/2)]) # pairwise comparison of qubits
qml.Hadamard(0)
probs=qml.probs(wires=[0])
return probs
seq_1='ACCCGTTTATTTGCCT'
print(seq_1)
seq_2='ACCCGTTTCTTTGCCT'
print(seq_2)
prob=SWAP_test(seq_1, seq_2 ,9)
print(prob[0]*2-1)
```

Thank you for alternative solution @CatalinaAlbornoz , I will study your suggestions and continue to explore various uses of quantum computing in real case problems. one of them is for pattern matching DNA in genomic data.

Thank you also for sharing your experience @Quvance , but when I run the code above I get the following error:

“**DeviceError: Operation StatePrep cannot be used after other Operations have already been applied on a default.qubit.autograd device**.”

I saw your github regarding the development of quantum computing, especially the DNAsequenceQuantumSearch project. but I’m still looking for how to achieve simple targets such as looking for patterns in genomic data in simple ways before I learn how to improve.

Based on your experience, is it possible to utilize quantum computing to beat search algorithms on classical computers in the case of searching DNA in genomic data?

I have left out the following bits from my code. It should run if you include this:

```
import pennylane as qml
from pennylane import numpy as np
import random as rn
```

However, I very much doubt that any quantum-based search algorithm would currently beat its classical counterpart.

Thank you for the correction. I forgot to add “import random as rn”. The code is running without any problems. The experimental results show probability values; the more similar they are, the higher the probability value.

However, when I make a difference in sequence length above 16 characters, the code gets the error **“Features must be of length 16 or smaller to be padded. got length 17”**. if brought to 16 the error **“Operation StatePrep cannot be used after other Operations have already been applied on a lightning.qubit device”**. How can I make the length of the sequence dynamic?

Thank you for your comment regarding our quantum-based search algorithm. Is the inability of the quantum-based search algorithm due to the results obtained by quantum computing having probability values, making it difficult to provide accurate results regarding the similarity and location of DNA patterns in genomic data? Is this because quantum computer technology is still in the development stage?

if you want to use more than 16 qubits then you have to increase the number of wires accordingly. Your sequence length must be <= 2^qubits. Because you are comparing 2 sequences with the SWAP test you will need twice the number of wires and an additonal one for the ancillary qubit.

The SWAP test does a pairwise comparison. If you shift your sequence by just one base it will give you a very low probability approaching zero.

Hi @QuantumDev, the reason why quantum computers aren’t great at solving these problems (compared to classical solutions) is because putting data into the quantum computer becomes a bottleneck. The problem here is not the probabilistic nature of the systems but just that classical systems have gotten very good at analyzing text so it’s very hard to compete. The question of “what are quantum computers good for” is actually an active research topic. At Xanadu for example our research team is focused on this question. One answer is simulating chemical systems to a very high degree of accuracy.

Are you familiar with quantum computing? I strongly encourage you to check out our introductory learning resources in the new version of our Codebook here or some more advanced ones in the old version here!