FFT-Based Integer Multiplication, Part 1

The fast Fourier transform allows us to convolve in n log n time, which is useful for fast multiplication. But the FFT uses floating-point calculations; what problems does this cause when we want exact integer results?

October 19, 2018

The convolution of two discrete, finite, length- N signals f and g (basically two length- N arrays) is denoted f*g . The result f*g is also a length- N signal, with


Convolutions appear in many applications:

  • Multiplication: Let f and g be the coefficient arrays of two polynomials, so f[n] is the coefficient for the x^n term. Then (f*g)[n] equals the coefficient on the x^n term when we multiply our polynomials (plus a wrapping effect, discussed later). Similarly, since integers can be expressed as polynomials ( 154=x^2+5x+4 at x=10 ) we can multiply integers by convolving their digits.
  • Probability: the PDF of the sum of two independent random variables is equal to the convolution of the two variables’ individual PDFs.
  • Computer graphics: we can also convolve 2D signals, and many effects can be achieved simply by convolving an image with a certain filter. Examples: box blur to blur an image, a Lanczos filter to downsize an image, and the Canny edge detector.

One technicality is that in our definition of (f*g)[n] , the index n-m can potentially be negative, which we get around by assuming that y “wraps around,” so for example y[-5]=y[N-5] . This wrapping behavior is often undesired (for example with polynomial multiplication, it leads to the coefficients of x^n and x^{n+N} being added together) but we can get rid of it by padding x and y with sufficient zeroes.

It’s easy to see that computing f*g naively is an O(N^2) operation–the result has N elements, and each element takes N operations to compute. This is where the discrete Fourier transform (DFT) comes in. The DFT of f returns an array F , also of length N . The DFT is defined such that the continuous function f’(t)=\frac{1}{N}\sum_{k=0}^{N-1}F[k]e^{2\pi jkt/N} has the property that f’(n)=f[n] for n\in[0, 1, …, N) ; it expresses f as a weighted sum of N sinusoidal functions (for reasons I’m not going to explain here, any f can be uniquely expressed like this). Since the values in F tell us how much to weight the different frequencies sinusoids, F represents our signal in what’s called the frequency domain.

The DFT is useful because by the convolution theorem, convolving in the time domain is equivalent to pointwise multiplication in the frequency domain. Furthermore, the fast Fourier transform algorithm allows us to compute the DFT and inverse DFT in O(N\log N) time. So, here’s an asymptotically superior way to compute f*g :

  1. Use FFT to compute F:=DFT(f) and G:=DFT(g) . Time: O(N\log N)
  2. By the convolution theorem, DFT(f * g)[k]=F[k]\cdot G[k] so we pointwise multiply our results from step 1. Time: O(N)
  3. Use FFT to apply the inverse DFT to our result from the previous step. Time: O(N\log N)

Which gives us O(N\log N) overall. This article (and the next part of it) will look at applying this technique to large integer multiplication. Computers have built-in instructions to multiply integers, but their inputs are limited by machine word size, so generally inputs must be below 2^{64} . To multiply larger numbers (for example RSA commonly uses 2048-bit integers), we need to implement multiplication ourselves in software. FFT-based multiplication has high overhead but the best known asymptotic complexity, so it’s used to multiply very large integers (at least tens of thousands of bits).

Floating point problems

For a length- N signal f ,

DFT(f)[k] = \sum_{n=0}^{N-1} x[n]e^{-jkn\omega_0}

where \omega_0=2\pi/N . This means the DFT inherently involves floating-point arithmetic, since e^{-jkn\omega_0}=\cos(-kn\omega_0)+j\sin(-kn\omega_0) ; trig implies floating point. This is very bad because we’re trying to implement exact integer arithmetic, but floating-point numbers become inexact in hard-to-analyze ways. Integers also have finite precision, but they’re much easier to analyze; an unsigned b -bit integer can represent anything in [0\ldots2^b) . Floating-point numbers are stored by computers in binary scientific notation; they’re expressed as s\cdot(1+x)\cdot2^y where s is the sign bit, x is a binary decimal between 0 and 1, and y is the exponent. The use of scientific notation makes it very hard to reason about which numbers are representable exactly, and which aren’t. For example, 2^{1009} and 2^{60}+2^8 can be stored exactly, but 2^{60}+2^7 cannot.

Order of operations also matters much more with floating point numbers. Even if the inputs and the output can be stored exactly, certain operation orderings will make intermediate results, and therefore the final result, inexact. Here’s a simple example:

#include <iostream>

int main() {
	unsigned long long la = 1LL << 60, lb = 1LL << 6;
	double a = la, b = lb;
	double r1 = (((a + b) + b) + b) + b;
	double r2 = a + ((b + b) + (b + b));

	unsigned long long ll1 = r1, ll2 = r2;
	unsigned long long true_res = la + 4 * lb;
	std::cout << ll1 << '\n' << ll2 << '\n' << true_res << '\n';

We are computing a+4b=2^{60}+4\cdot2^6 with doubles in two ways; you can see the output of the code here. Note that even though r1 and r2 are doing the same math, only r2 gets the final answer right. This is because even though 2^{60}+4\cdot2^6 is exactly representable, r1's intermediates 2^{60}+2^6 , 2^{60}+2\cdot2^6 , and 2^{60}+3\cdot2^6 are not. Admittedly, integers have a flavor of this problem too; 2^{16}(1000\cdot2^{14}-999\cdot2^{14}) won’t overflow a 32-bit integer but 1000\cdot2^{16}\cdot2^{14}-999\cdot2^{16}\cdot2^{14} will. Still, integers are far more easy to reason about. What’s easier to determine:

  • (64-bit integers) are all intermediate values in [0\ldots2^{64}) ?
  • (Doubles) If we take all intermediate values and view them as binary decimals, are we sure that there are at most 51 bits between most-significant bit and least significant bit, exclusive?

This is why analyzing the floating-point error from FFT results in awful-looking bounds like this (taken from Modern Computer Algebra, equation 3.2):

||z’-z||_\infty\le ||x||\cdot||y||\cdot((1+\epsilon)^{3n}(1+\epsilon\sqrt{5})^{3n+1}(1+\mu)^{3n}-1)


  • z=x*y (the true result)
  • z' is the result computed with floating point
  • x,y,z are of length 2^n
  • for whatever reason \epsilon=2^{-53} and \mu=\epsilon/\sqrt{2} when using doubles

So if we keep the right-hand side of the inequality below 0.5 and round to the nearest integer, things should theoretically work out fine. I have no idea how this bound was derived, but there are probably a lot of pitfalls to watch out for:

  • The bound probably assumes certain floating-point behaviors like those specified in IEEE-754. Certain processors, especially lower-power ones, often aren’t fully IEEE-compliant.
  • The naive, O(N^2) DFT formula is often used as a base case for FFT because the naive method tends to be faster for small N . However, the naive method has worse numerical precision characteristics, so the inequality must be modified if this optimization is chosen.
  • This bound probably assumes a certain order of operations which the implementor must be careful to follow, and the compiler must preserve.

Pitfalls like this probably make it a nightmare to maintain an integer multiplication algorithm based on floating-point FFT across different platforms and compilers. This is why GMP, the big integer library used by Mathematica and Maple, doesn’t use floating-point FFT. It instead uses the number theoretic transform (NTT), which is essentially an FFT over modular arithmetic instead of complex roots of unity. The second part of this post will go over NTT in detail.

It is worth noting though that in many applications, floating-point error is fairly acceptable, because the original input signal wasn’t even exact to begin with (ex. FFT on microphone input). FFT is also generally simpler to implement than NTT, and FFT’s precision problems don’t manifest themselves until inputs get fairly large, so FFT is sometimes still used in competitive programming even when exact integer output is desired.