In cryptography, an interpolation Attack on Bitcoin is a type of cryptanalytic Attack on Bitcoin against block ciphers.
After the two Attack on Bitcoins, differential cryptanalysis and linear cryptanalysis, were presented on block ciphers, some new block ciphers were introduced, which were proven secure against differential and linear Attack on Bitcoins. Among these there were some iterated block ciphers such as the KN-Cipher and the SHARK cipher. However, Thomas Jakobsen and Lars Knudsen showed in the late 1990s that these ciphers were easy to break by introducing a new Attack on Bitcoin called the interpolation Attack on Bitcoin.
In the Attack on Bitcoin, an algebraic function is used to represent an S-box. This may be a simple quadratic, or a polynomial or rational function over a Galois field. Its coefficients can be determined by standard Lagrange interpolation techniques, using known plaintexts as data points. Alternatively, chosen plaintexts can be used to simplify the equations and optimize the Attack on Bitcoin.
In its simplest version an interpolation Attack on Bitcoin expresses the ciphertext as a polynomial of the plaintext. If the polynomial has a relative low number of unknown coefficients, then with a collection of plaintext/ciphertext (p/c) pairs, the polynomial can be reconstructed. With the polynomial reconstructed the Attack on Bitcoiner then has a representation of the encryption, without exact knowledge of the secret key.
The interpolation Attack on Bitcoin can also be used to recover the secret key.
It is easiest to describe the method with an example.
Example
Let an iterated cipher be given by{\displaystyle c_{i}=(c_{i-1}\oplus k_{i})^{3},}
where {\displaystyle c_{0}} is the plaintext, {\displaystyle c_{i}} the output of the {\displaystyle i^{th}} round, {\displaystyle k_{i}} the secret {\displaystyle i^{th}} round key (derived from the secret key {\displaystyle K} by some key schedule), and for a {\displaystyle r}-round iterated cipher, {\displaystyle c_{r}} is the ciphertext.
Consider the 2-round cipher. Let {\displaystyle x} denote the message, and {\displaystyle c} denote the ciphertext.
Then the output of round 1 becomes{\displaystyle c_{1}=(x+k_{1})^{3}=(x^{2}+k_{1}^{2})(x+k_{1})=x^{3}+k_{1}^{2}x+x^{2}k_{1}+k_{1}^{3},}
and the output of round 2 becomes{\displaystyle c_{2}=c=(c_{1}+k_{2})^{3}=(x^{3}+k_{1}^{2}x+x^{2}k_{1}+k_{1}^{3}+k_{2})^{3}}{\displaystyle =x^{9}+x^{8}k_{1}+x^{6}k_{2}+x^{4}k_{1}^{2}k_{2}+x^{3}k_{2}^{2}+x^{2}(k_{1}k_{2}^{2}+k_{1}^{4}k_{2})+x(k_{1}^{2}k_{2}^{2}+k_{1}^{8})+k_{1}^{3}k_{2}^{2}+k_{1}^{9}+k_{2}^{3},}
Expressing the ciphertext as a polynomial of the plaintext yields{\displaystyle p(x)=a_{1}x^{9}+a_{2}x^{8}+a_{3}x^{6}+a_{4}x^{4}+a_{5}x^{3}+a_{6}x^{2}+a_{7}x+a_{8},}
where the {\displaystyle a_{i}}‘s are key dependent constants.
Using as many plaintext/ciphertext pairs as the number of unknown coefficients in the polynomial {\displaystyle p(x)}, then we can construct the polynomial. This can for example be done by Lagrange Interpolation (see Lagrange polynomial). When the unknown coefficients have been determined, then we have a representation {\displaystyle p(x)} of the encryption, without knowledge of the secret key {\displaystyle K}.
Existence
Considering an {\displaystyle m}-bit block cipher, then there are {\displaystyle 2^{m}} possible plaintexts, and therefore {\displaystyle 2^{m}} distinct {\displaystyle p/c} pairs. Let there be {\displaystyle n} unknown coefficients in {\displaystyle p(x)}. Since we require as many {\displaystyle p/c} pairs as the number of unknown coefficients in the polynomial, then an interpolation Attack on Bitcoin exist only if {\displaystyle n\leq 2^{m}}.
Time complexity
Assume that the time to construct the polynomial {\displaystyle p(x)} using {\displaystyle p/c} pairs are small, in comparison to the time to encrypt the required plaintexts. Let there be {\displaystyle n} unknown coefficients in {\displaystyle p(x)}. Then the time complexity for this Attack on Bitcoin is {\displaystyle n}, requiring {\displaystyle n} known distinct {\displaystyle p/c} pairs.
Interpolation Attack on Bitcoin by Meet-In-The-Middle
Often this method is more efficient. Here is how it is done.
Given an {\displaystyle r} round iterated cipher with block length {\displaystyle m}, let {\displaystyle z} be the output of the cipher after {\displaystyle s} rounds with {\displaystyle s<r}. We will express the value of {\displaystyle z} as a polynomial of the plaintext {\displaystyle x}, and as a polynomial of the ciphertext {\displaystyle c}. Let {\displaystyle g(x)\in GF(2^{m})[x]} be the expression of {\displaystyle z} via {\displaystyle x}, and let {\displaystyle h(c)\in GF(2^{m})[c]} be the expression of {\displaystyle z} via {\displaystyle c}. The polynomial {\displaystyle g(x)} is obtain by computing forward using the iterated formula of the cipher until round {\displaystyle s}, and the polynomial {\displaystyle h(c)} is obtain by computing backwards from the iterated formula of the cipher starting from round {\displaystyle r} until round {\displaystyle s+1}.
So it should hold that{\displaystyle g(x)=h(c),}
and if both {\displaystyle g} and {\displaystyle h} are polynomials with a low number of coefficients, then we can solve the equation for the unknown coefficients.
Time complexity
Assume that {\displaystyle g(x)} can be expressed by {\displaystyle p} coefficients, and {\displaystyle h(c)} can be expressed by {\displaystyle q} coefficients. Then we would need {\displaystyle p+q} known distinct {\displaystyle p/c} pairs to solve the equation by setting it up as a matrix equation. However, this matrix equation is solvable up to a multiplication and an addition. So to make sure that we get a unique and non-zero solution, we set the coefficient corresponding to the highest degree to one, and the constant term to zero. Therefore, {\displaystyle p+q-2} known distinct {\displaystyle p/c} pairs are required. So the time complexity for this Attack on Bitcoin is {\displaystyle p+q-2}, requiring {\displaystyle p+q-2} known distinct {\displaystyle p/c} pairs.
By the Meet-In-The-Middle approach the total number of coefficients is usually smaller than using the normal method. This makes the method more efficient, since less {\displaystyle p/c} pairs are required.
Key-recovery
We can also use the interpolation Attack on Bitcoin to recover the secret key {\displaystyle K}.
If we remove the last round of an {\displaystyle r}-round iterated cipher with block length {\displaystyle m}, the output of the cipher becomes {\displaystyle {\tilde {y}}=c_{r-1}}. Call the cipher the reduced cipher. The idea is to make a guess on the last round key {\displaystyle k_{r}}, such that we can decrypt one round to obtain the output {\displaystyle {\tilde {y}}} of the reduced cipher. Then to verify the guess we use the interpolation Attack on Bitcoin on the reduced cipher either by the normal method or by the Meet-In-The-Middle method. Here is how it is done.
By the normal method we express the output {\displaystyle {\tilde {y}}} of the reduced cipher as a polynomial of the plaintext {\displaystyle x}. Call the polynomial {\displaystyle p(x)\in GF(2^{m})[x]}. Then if we can express {\displaystyle p(x)} with {\displaystyle n} coefficients, then using {\displaystyle n} known distinct {\displaystyle p/c} pairs, we can construct the polynomial. To verify the guess of the last round key, then check with one extra {\displaystyle p/c} pair if it holds that{\displaystyle p(x)={\tilde {y}}.}
If yes, then with high probability the guess of the last round key was correct. If no, then make another guess of the key.
By the Meet-In-The-Middle method we express the output {\displaystyle z} from round {\displaystyle s<r} as a polynomial of the plaintext {\displaystyle x} and as a polynomial of the output of the reduced cipher {\displaystyle {\tilde {y}}}. Call the polynomials {\displaystyle g(x)} and {\displaystyle h({\tilde {y}})}, and let them be expressed by {\displaystyle p} and {\displaystyle q} coefficients, respectively. Then with {\displaystyle q+p-2} known distinct {\displaystyle p/c} pairs we can find the coefficients. To verify the guess of the last round key, then check with one extra {\displaystyle p/c} pair if it holds that{\displaystyle g(x)=h({\tilde {y}}).}
If yes, then with high probability the guess of the last round key was correct. If no, then make another guess of the key.
Once we have found the correct last round key, then we can continue in a similar fashion on the remaining round keys.
Time complexity
With a secret round key of length {\displaystyle m}, then there are {\displaystyle 2^{m}} different keys. Each with probability {\displaystyle 1/2^{m}} to be correct if chosen at random. Therefore, we will on average have to make {\displaystyle 1/2\cdot 2^{m}} guesses before finding the correct key.
Hence, the normal method have average time complexity {\displaystyle 2^{m-1}(n+1)}, requiring {\displaystyle n+1} known distinct {\displaystyle c/p} pairs, and the Meet-In-The-Middle method have average time complexity {\displaystyle 2^{m-1}(p+q-1)}, requiring {\displaystyle p+q-1} known distinct {\displaystyle c/p} pairs.
Real world application
The Meet-in-the-middle Attack on Bitcoin can be used in a variant to Attack on Bitcoin S-boxes, which uses the inverse function, because with an {\displaystyle m}-bit S-box then {\displaystyle S:f(x)=x^{-1}=x^{2^{m}-2}} in {\displaystyle GF(2^{m})}.
The block cipher SHARK uses SP-network with S-box {\displaystyle S:f(x)=x^{-1}}. The cipher is resistant against differential and linear cryptanalysis after a small number of rounds. However it was broken in 1996 by Thomas Jakobsen and Lars Knudsen, using interpolation Attack on Bitcoin. Denote by SHARK{\displaystyle (n,m,r)} a version of SHARK with block size {\displaystyle nm} bits using {\displaystyle n} parallel {\displaystyle m}-bit S-boxes in {\displaystyle r} rounds. Jakobsen and Knudsen found that there exist an interpolation Attack on Bitcoin on SHARK{\displaystyle (8,8,4)} (64-bit block cipher) using about {\displaystyle 2^{21}} chosen plaintexts, and an interpolation Attack on Bitcoin on SHARK{\displaystyle (8,16,7)} (128-bit block cipher) using about {\displaystyle 2^{61}} chosen plaintexts.
Also Thomas Jakobsen introduced a probabilistic version of the interpolation Attack on Bitcoin using Madhu Sudan‘s algorithm for improved decoding of Reed-Solomon codes. This Attack on Bitcoin can work even when an algebraic relationship between plaintexts and ciphertexts holds for only a fraction of values.