Pf: Write n = m1m2 with m1
m2 (in a worst case, m1 = 1 will work). Since n is odd, each of m1 and m2 are odd. Let a = ½(m1 + m2) and b = ½(m2 - m1). Note that a and b are integers. Then m1 = a-b and m2 = a+b, so n = m1m2 = (a-b)(a+b) = a2 - b2.
Now suppose we wish to find a factor of the odd integer n (> 1). Examine, in turn, the numbers n, n+12, n+22, n+32, ... until you find a square (this is guaranteed to exist by the theorem), say n + b2 = a2, then n = a2 - b2 = (a+b)(a-b) and so, factors of n have been located.
Example: Find a factor of n = 152398989. Looking for a square in the sequence, 152398989, 152398990, 152398993, 152398998, 152399005, 152399014, 152399025, 152399038, ... we have (12344.998541...)2, (12344.99582...)2, (12344.99870...)2, (12344.99890...)2, (12344.99918...)2, (12344.99955..)2, (12345)2. Thus, n = (12345)2 - 62 = (12351)(12339).
The method can be sped up a bit by observing that the last digit of a square must be a 0,1,4,5,6 or 9. However, taking square roots to determine if a number is a square is a slow operation, and this naive approach is therefore not very fast. A better algorithm to search for squares would be to examine the sequence of integers given by ([k] + i)2 - n for a square, where [k] is the square root of n rounded up to the nearest integer (ceiling function), and i = 0,1,2,... . This algorithm does not take as many square roots and those it does take are of smaller numbers. In the above example, using this algorithm, the factorization would have been found in the first step (but, to be honest, trial division would have found a different factorization in one step as well).
Factors which are nearly equal will be found fairly quickly by this procedure, thus in the RSA application one must make sure that the two primes are not too close together.
2B! mod n. If d = gcd(a-1, n) satisfies 1 < d < n, then d is a factor of n. To see why this works, consider a prime p which divides n. If p-1 divides B! (which will be the case if, for instance, p-1 only has small prime divisors) then we have B! = (p-1)k for some integer k. Now, since a
2B! mod n, we also have a
2B! mod p since p|n. By Fermat's theorem 2p-1
1 mod p, so a
2B! = (2p-1)k
1 mod p. Therefore p | (a-1) and p | n, so p | gcd(a-1,n) = d. Hence, 1 < d and if d < n, d will be a proper divisor of n.
Example: Let n = 15770708441. Choose B = 180. Then a = 11620221425 and we compute d = 135979. We get the factorization 15770708441 = (135979)(115979). The reason that factorization worked is that d-1 = 135978 = 2(3)(131)(173) has only small prime factors. Any B
173 would have worked for this n.
The choice of B is crucial in this algorithm. If B is small, the algorithm will run quickly, but the chance of success is small. On the other hand, if B is large, the algorithm will find a factor, but the runtime will be prohibitively slow (comparable to trial division).
In the RSA application, one must ensure that the primes p and q have the property that p-1 and q-1 have at least one large prime factor to avoid an attack by this method. We shall see a generalization of this method later when we consider elliptic curves.
s2 mod n, with t
±s mod n, then since n|t2 - s2 = (t+s)(t-s) while it doesn't divide either t+s or t-s, n must have some non-trivial common factor with both t+s and t-s. One of these common factors is a = gcd(t+s,n) and the other is b = n/a.
Example: Suppose we want to factor n = 4633. If we notice that 1182
25 = 52 mod 4633, then a = gcd(118+5, 4633) = gcd(123, 4633) = 41 and we have 4633 = (41)(113).
The factoring problem is then reduced to finding a congruence of this type. To manufacture such a congruence we use the concept of a factor base. A factor base is simply a set of small primes which is not too big. If B is a factor base, then a number all of whose prime factors lie in B is said to be B-smooth. To find a congruence of the form t2
s2 mod n, we first find several numbers bi so that (bi)2 reduced mod n is B-smooth for a fixed factor base B. Since |B| is small, there will be many repeated primes in the factorizations of these numbers. The next task would be to find some subset of the (bi)2's so that all the primes that appear in the product of these (bi)2's appear to an even power (so, the product will be a square mod n).
Example: Factor n = 2043221 using the factor base B = {2,3,5,7,11}. We find, by means to be discussed below, the following B-smooth squares:
Consider the 3rd and 4th numbers; we see that [(3197)(3199)]2
28 38 72 mod 2043221. Thus, t = (3197)(3199) mod 2043221 = 11098 and s = 24 34 7 mod 2043221 = 9072. Now gcd(t+s,n) = gcd(11098+9072, 2043221) = 2017 and we have 2043221 = (2017)(1013).
This example also illustrates what can go wrong with the procedure. Had we taken the first two numbers, we would have obtained [(1439)(2878)]2
26 58 112 mod n, so t = (1439)(2878) mod n = 55000 and s = 23 54 11 mod n = 55000, i.e. t = s, and this does not lead to a factorization.
In the example we found the appropriate subset of bi's to multiply by inspection, but we can do this systematically and at the same time answer the question of how many bi's do we need to find? We form a 0-1 matrix where each row corresponds to one of the B-smooth squares, having |B| columns, each column corresponding to one prime in the factor base B. For each row the entry in the jth column is a 1 if the jth prime of B appears to an odd power and 0 otherwise. For the last example this matrix would look like:
0 0 0 0 1
0 0 0 0 1
1 1 0 0 0
1 1 0 0 0
0 1 0 1 0.
s2 mod n, however there is no guarantee that t
±s mod n. When this occurs, we can either use another set of linearly dependent rows (often requiring the finding of new B-smooth squares) or change the factor base. It should now be clear why we are a little vague in the definition of a factor base. If the factor base is small, we will need to only a few B-smooth squares to get a linear dependency, however, having a small factor base means that the B-smooth squares are rare and so finding them will be hard. On the other hand, a large factor base means that there are many more B-smooth squares, so they will be easier to find, but we will then need to find many more of them. A good algorithm based on these considerations would therefore be one for which the factor base is not too big and which has an efficient way of finding B-smooth squares.
One could try randomly selecting the bi and if n is not too large this will be effective, but for large n it isn't. A more effective procedure would be to select the bi's to be integers near the square root of kn for different choices of k. The squares of these bi's will be near kn, so, when reduced mod n they should be small and thus made up of only small primes. Another procedure, due to Pomerance, is to start with a large interval of integers around the square root of n, and then systematically remove integers based on a quadratic relationship with each prime in the factor base. The remaining integers have a high probability of being B-smooth. This method is known as the Quadratic Sieve. A more recent algorithm, known as the Number Field Sieve finds the B-smooth squares by means of computations in rings of algebraic integers.
For factoring RSA moduli, the quadratic sieve has been the most successful algorithm. In April 1994, a 129-digit number known as RSA-129 was factored by Atkins, Graff, Lenstra and Leyland using the quadratic sieve. The numbers RSA-100, RSA-110, ..., RSA-500 were a list of RSA moduli publicized on the Internet (RSA Labs) as "challenge" numbers for factoring algorithms. Each number RSA-d was a d-digit number that is the product of two primes of approximately the same length. The numbers RSA-100, RSA-110, RSA-120, RSA-129, RSA-130, RSA-140, RSA-155 and RSA-160 have all been factored (the last of these on April 1, 2003). In 2001, RSA Labs renamed and reissued the "challenge" numbers and assigned specific monetary rewards for their factoring. The new list (available at RSA Labs) uses the number of digits in the binary representation in the name, starting at RSA-576 (worth $10K) and going up to RSA-2048 ($200K).
The number field sieve seems to have great potential since its asymptotic running time is faster than other known algorithms. It is still in the developmental stages, but many researchers feel that it might prove to be faster for numbers having more than about 125-130 digits. In 1990, the number field sieve was used by Lenstra, Lenstra, Manasse and Pollard to factor 2512 + 1. On December 3, 2003 the factoring of RSA-576 (174 digits) was announced by a group at the German Federal Agency for Information Technology Security (BIS). They used a number field sieve to obtain the two 87-digit prime factors. The smallest challenge number is now RSA-640 worth $20K$.