Appendix: Analysis of Multi-Prime RSA Security
Classically, an RSA modulus has been composed from two primes. However, there are very practical reasons why using more than two primes might be preferred.
- The primes are smaller and key generation takes less time despite there being more of them.
- Private key operations take less time if one uses the Chinese Remainder Theorem. Using three primes vs. two primes gives a theoretical speedup of 9/4. A speedup of 1.8 to 2.0 has been achieved in practice.
There are two possible methods for attacking an RSA key if it is built from more than two primes. The first is NFS, which has already been discussed. The second method is the Elliptic Curve Method (ECM).
This discussion assumes minimal familiarity with the Elliptic Curve factoring algorithm. A brief, simplified description of the method follows:
One chooses two integers A and B at random and considers the equation y2 = x3 + Ax + B. This is an elliptic curve. Suppose N is the RSA modulus and that p is a prime dividing N. We need the following definition:
m-Smooth Number - An m- smooth number is one which has all of its prime factors less than m.
The number of points on the randomly chosen elliptic curve taken modulo p is a random integer near p. ECM succeeds when this number of points is m-smooth for a suitably chosen small value of m.
Whereas the time for the Number Field Sieve to factor a number depends only on the size of the number, the Elliptic Curve Method finds small prime factors of a larger number and its run time depends on the size of the factors.
Throughout this paper it is assumed that an RSA modulus built from more than two primes will (try to) be broken with the Elliptic Curve Method. Therefore the MIPS-Year estimates in sections V. and higher are based upon using ECM. The difficulty of breaking a modulus with ECM is then computed relative to a base point. That base point is the factorization of RSA-512 with NFS.
Definition of MIPS-Year
We define a MIPS-Year to be 3.1 x 1013 arithmetic operations. This is 1 x 106 operations/sec x 3600sec/hr x 24hrs/day x 365 days/yr x 1 yr.
II. Difficulty in MIPS-Years of Breaking RSA with the Number Field Sieve
We have a benchmark data point using NFS for breaking RSA-512 of
8000 MIPS-Years for the sieving using ~50 Mbytes/machine
10 Days on a Cray using 2.3Gbytes to do the matrix.
To measure the difficulty for NFS in MIPS-Years of factoring moduli greater than 512 bits we use:
8000 * L(N)/L(2512)
where L(N) was defined in the main body of this paper.
III. Method of Analysis for ECM
ECM succeeds when a randomly chosen curve (taken over Z/pZ where p | N is the factor we hope to find) has its group order smooth up to B1 for suitably chosen B1. This means that all the prime factors of the order of the curve must be less than B1. As it is usually implemented ECM is run in two steps. In Step 1, one hopes that the order of the curve is smooth up to B1. In Step 2, one hopes that the order is smooth up to B1 except for a single prime factor lying between B1 and B2 for a suitably chosen value of B2. The second step has the effect of about an order-of-magnitude speedup in practice, although it does not affect the asymptotic complexity of the method. Step 1 requires approximately B1 elliptic curve point additions. The number of operations for Step 2 depends upon the way it is implemented ? there are several ways of doing so. If one uses Fast Fourier Transforms, Step 2 takes about the same time as Step 1, when B2 = B12.
The probability that an integer y has all of its prime factors less than x is about u-u where u = (log y)/(log x). This is how the probability of success, per curve, is computed herein.
Then, the total work in MIPS-Years to find a factor of an n bit number is .033 * (n/1024)^2 * B1/107 * uu. See below for an explanation of the numbers .033 and 107. The expression (n/1024)^2 is the relative work factor to execute the multi-precision arithmetic for an n bit number as compared with the 1024 bit benchmark given below. The expression B1/107 represents the relative work factor to take the computations to B1 relative to the benchmark for B1 = 107 given below. And uu is the number of curves needed.
It is assumed that when running (say) k curves, one can give one curve to each of k machines, or two curves to each of k/2 machines etc.
We assume that the modulus has been broken when ECM finds a single factor. Clearly if more than two primes are used this means we still have a composite cofactor. However, pulling out the next factor with ECM is then easier because the modulus is smaller. Thus, one might want to apply ECM again once the first factor is removed, or use NFS if it would then be faster.
IV. Elliptic Curve Benchmark Data
We have an established benchmark of:
Executing Step 1 to 1.33 x 107 takes 454 seconds on a 500MHz Dec Alpha 21264 for a 127 decimal digit modulus. This translates to 2000 seconds for Step 1 to 107 on a 1024 bit modulus. 2000 seconds on this machine translates roughly to about .033 MIPS-Years for this problem, assuming 1 instruction/cycle. This assumption does not apply to integer multiplies, which dominate elliptic curve additions, but since this is a super-scalar, pipelined architecture, the machine averages "about" 1 instruction/cycle. We designate the Step 1 limit by the variable B1 throughout this paper.
Note that this is a 64 bit machine. The same computation on a 32 bit machine, assuming all other architectural features to be the same will take four times as long because the complexity of multi-precision arithmetic decreases with the square of the word size.
This can be sped by dedicated hardware for multi-precision arithmetic. Indeed, the Wiener ECDL cracking machine, discussed in the main body of this paper, is easily adapted to run ECM. We therefore borrow his design for this purpose. This machine has 2.5 x 107 processors and it will compute the addition of two points on an elliptic curve modulo a 155 bit number at the rate of 20,000 additions/sec. Therefore, the time on this machine needed to execute Step 1 of ECM to the limit of B1 is
B1/20000 * (n/155)2 where n is the size in bits of the RSA modulus. The term (n/155)2 represents the relative difficulty of doing the arithmetic modulo N, as opposed to the 155 bit arithmetic of the Wiener machine.
While ECM has a second stage that can be implemented, this second stage does not affect the asymptotic run time of the algorithm. It does however, have some practical importance for the size of the numbers under consideration here. Using Fast Fourier Transform techniques can allow computing Step 2 to B12 in about the same time as executing Step 1 to B1. This can give about an order of magnitude performance increase at twice the time cost. The space requirements, while large, are manageable (128 to 256 Mbytes for 1024 bit moduli per processor). This paper computes ECM costs and probability of success based upon a Step 1 implementation only.
Let p be the probability of success with a single curve. Then the expected number of curves needed to succeed is 1/p. The probability of succeeding with 1/p curves is then
1 ? (1- p)^(1/p) and this is about 1/e ~ .63 when p is moderately large. Halving the number of curves yields a probability of success of about .39, while doubling the number of curves yields a probability of success of about .88. Except where noted otherwise, this paper will assume that 1/p curves are used. Thus, this method does not succeed with certainty.
Note that as B1 increases, the probability of success per curve increases and the required number of curves decreases. But the time required to run each curve increases. There is an optimal choice of B1 depending on the size of the factor being sought. The tables below show where this approximately occurs.
V. Analysis of 768 bit RSA Modulus with three 256 bit Primes.
|B1||Time/Curve (hrs)||Probability Per Curve||No. of Curves||Total MIPS-Years|
The minimum occurs with B1 ~ 1010. With the benchmark DEC Alpha each curve requires 310 hours to compute and has a probability of success of 1.5 x 10-7. Thus, 6.8 million curves are needed, giving a total work effort of 1.2 x 108 MIPS-Years. One can run each curve for only 31 hours, but then the total number of curves goes up by a factor of 14 and the total work by a factor of about 1.4.
This is 15000 times harder than breaking RSA-512 with NFS. With Step 2 implemented, this might be 3000 to 5000 times harder.
Note that breaking this modulus with NFS is 6100 times harder than RSA-512, so a 768 bit modulus with three primes is as hard to break as one with two primes with respect to time equivalence only.
We now do a cost-based analysis based on the Wiener machine. It can execute the algorithm to 1010 on a 768 bit modulus in 1010/20000 * (768/155)2 seconds which is about 3400 hours (5 months) . This machine can try 25 million curves at once during this time but only 6.8 million are needed to succeed with probability .63. If we use 25 million curves, the probability of success is about .974.
Note that running only 6.8 million curves (giving one per processor) leaves many processors idle. One way to rectify this is to run more curves than is optimal (25 million), while slightly lowering the Step 1 limit. This results in more total arithmetic being performed, but will slightly lower the elapsed time. An alternate way to utilize addition processors would be to use them to speed the multi-precision arithmetic on a per-curve basis. However this would require a major redesign of the machine. Similar comments apply to other key sizes as long as the number of needed curves is less than the number of processors on the machine.