The convolution of two discrete, finite, length- signals and (basically two length- arrays) is denoted . The result is also a length- signal, with
Convolutions appear in many applications:
- Multiplication: Let and be the coefficient arrays of two polynomials, so is the coefficient for the term. Then equals the coefficient on the term when we multiply our polynomials (plus a wrapping effect, discussed later). Similarly, since integers can be expressed as polynomials ( at ) we can multiply integers by convolving their digits.
- Probability: the PDF of the sum of two independent random variables is equal to the convolution of the two variables’ individual PDFs.
- Computer graphics: we can also convolve 2D signals, and many effects can be achieved simply by convolving an image with a certain filter. Examples: box blur to blur an image, a Lanczos filter to downsize an image, and the Canny edge detector.
One technicality is that in our definition of , the index can potentially be negative, which we get around by assuming that “wraps around,” so for example . This wrapping behavior is often undesired (for example with polynomial multiplication, it leads to the coefficients of and being added together) but we can get rid of it by padding and with sufficient zeroes.
It's easy to see that computing naively is an operation–the result has elements, and each element takes operations to compute. This is where the discrete Fourier transform (DFT) comes in. The DFT of returns an array , also of length . The DFT is defined such that the continuous function has the property that for ; it expresses as a weighted sum of sinusoidal functions (for reasons I'm not going to explain here, any can be uniquely expressed like this). Since the values in tell us how much to weight the different frequencies sinusoids, represents our signal in what's called the frequency domain.
The DFT is useful because by the convolution theorem, convolving in the time domain is equivalent to pointwise multiplication in the frequency domain. Furthermore, the fast Fourier transform algorithm allows us to compute the DFT and inverse DFT in time. So, here's an asymptotically superior way to compute :
- Use FFT to compute and . Time:
- By the convolution theorem, so we pointwise multiply our results from step 1. Time:
- Use FFT to apply the inverse DFT to our result from the previous step. Time:
Which gives us overall. This article (and the next part of it) will look at applying this technique to large integer multiplication. Computers have built-in instructions to multiply integers, but their inputs are limited by machine word size, so generally inputs must be below . To multiply larger numbers (for example RSA commonly uses 2048-bit integers), we need to implement multiplication ourselves in software. FFT-based multiplication has high overhead but the best known asymptotic complexity, so it's used to multiply very large integers (at least tens of thousands of bits).
Floating point problems
For a length- signal ,
where . This means the DFT inherently involves floating-point arithmetic, since ; trig implies floating point. This is very bad because we're trying to implement exact integer arithmetic, but floating-point numbers become inexact in hard-to-analyze ways. Integers also have finite precision, but they're much easier to analyze; an unsigned -bit integer can represent anything in . Floating-point numbers are stored by computers in binary scientific notation; they're expressed as where is the sign bit, is a binary decimal between 0 and 1, and is the exponent. The use of scientific notation makes it very hard to reason about which numbers are representable exactly, and which aren't. For example, and can be stored exactly, but cannot.
Order of operations also matters much more with floating point numbers. Even if the inputs and the output can be stored exactly, certain operation orderings will make intermediate results, and therefore the final result, inexact. Here's a simple example:
We are computing with doubles in two ways; you can see the output of the code here. Note that even though
r2 are doing the same math, only
r2 gets the final answer right. This is because even though is exactly representable,
r1's intermediates , , and are not. Admittedly, integers have a flavor of this problem too; won't overflow a 32-bit integer but will. Still, integers are far more easy to reason about. What's easier to determine:
- (64-bit integers) are all intermediate values in ?
- (Doubles) If we take all intermediate values and view them as binary decimals, are we sure that there are at most 51 bits between most-significant bit and least significant bit, exclusive?
This is why analyzing the floating-point error from FFT results in awful-looking bounds like this (taken from Modern Computer Algebra, equation 3.2):
- (the true result)
- is the result computed with floating point
- are of length
- for whatever reason and when using doubles
So if we keep the right-hand side of the inequality below 0.5 and round to the nearest integer, things should theoretically work out fine. I have no idea how this bound was derived, but there are probably a lot of pitfalls to watch out for:
- The bound probably assumes certain floating-point behaviors like those specified in IEEE-754. Certain processors, especially lower-power ones, often aren't fully IEEE-compliant.
- The naive, DFT formula is often used as a base case for FFT because the naive method tends to be faster for small . However, the naive method has worse numerical precision characteristics, so the inequality must be modified if this optimization is chosen.
- This bound probably assumes a certain order of operations which the implementor must be careful to follow, and the compiler must preserve.
Pitfalls like this probably make it a nightmare to maintain an integer multiplication algorithm based on floating-point FFT across different platforms and compilers. This is why GMP, the big integer library used by Mathematica and Maple, doesn't use floating-point FFT. It instead uses the number theoretic transform (NTT), which is essentially an FFT over modular arithmetic instead of complex roots of unity. The second part of this post will go over NTT in detail.
It is worth noting though that in many applications, floating-point error is fairly acceptable, because the original input signal wasn't even exact to begin with (ex. FFT on microphone input). FFT is also generally simpler to implement than NTT, and FFT's precision problems don't manifest themselves until inputs get fairly large, so FFT is sometimes still used in competitive programming even when exact integer output is desired.