pollardrho.tex 9.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243
  1. \chapter{Pollard's $\rho$ factorization method \label{chap:pollardrho}}
  2. The \emph{Monte Carlo} factorization method, published by J. M. Pollard in
  3. ~\cite{pollardMC}, consists into identifying a periodically recurrent sequence
  4. of integers within a random walk $\pmod{N}$ that could leak one of the two
  5. factors.
  6. Consider a function $f$ from $\mathcal{S}$ to $\mathcal{S}$, where
  7. $\mathcal{S} = \{0, 1, \ldots, q-1\}$. Let $s$ be a random element in
  8. $\mathcal{S}$, and consider the sequence
  9. \begin{align*}
  10. s,\ f(s),\ f(f(s)),\ \ldots
  11. \end{align*}
  12. Since $f$ acts over a finite set, it is clear that this sequence must
  13. eventually repeat, and become cyclic.
  14. We might diagram it with the letter $\rho$, where the tail represent the
  15. cyclic part, or \emph{epacts}, and the oval the cyclic part, or
  16. \emph{period}.
  17. \begin{center}
  18. \begin{tikzpicture}[scale=0.7, thick]
  19. \tikzstyle{every node}=[draw,circle,fill=white,minimum size=4pt,
  20. inner sep=0pt]
  21. \node (1) at (1.4, 0.2) [label=left:$x_1$] {};
  22. \node (2) at (2.5, 3) [label=left:$x_{i-2}$] {};
  23. \node (3) at (3.25, 5) [label=left:$x_{i-1}$] {};
  24. \node (4) at (4, 7) [label=left:$ x_i \equiv x_j $] {};
  25. \node (5) at (6, 9) [label=above:$x_{i+1}$] {};
  26. \node (6) at (8, 7) [label=right:$x_{i+2}$] {};
  27. \node (7) at (6, 5) [label=below:$x_{j-1}$] {};
  28. \path (1) edge [dashed] (2);
  29. \path (2) edge (3);
  30. \path (3) edge (4);
  31. \path (4) edge [bend left] (5);
  32. \path (5) edge [bend left] (6);
  33. \path (6) edge [bend left, dashed] (7);
  34. \path (7) edge [bend left] (4);
  35. %%\draw [decorate,decoration={brace, raise=1.5cm}] (1) -- (3)
  36. %%node[draw=no] at (-1.5, 4) {tail};
  37. \draw [decorate,decoration={brace, raise=3cm}] (5) -- (7)
  38. node[draw=none] at (13, 7) {\footnotesize {period $\pmod{q}$}};
  39. \end{tikzpicture}
  40. \end{center}
  41. Now, consider $N=pq$.
  42. Let $F(x)$ be any function generating pseudorandom numbers $\angular{x_1, x_2, \ldots}$,
  43. and let $f(x) = F(x) \pmod{q}$.
  44. As we said above, without any luck, there will be a pair $\angular{x_i, x_j}$
  45. generated by $F$ such that $x_i \equiv x_j \pmod{q}$, but $x_i \neq x_j$.
  46. Therefore, in order to factorize $N$, we proceed as follows: starting from a
  47. random $s$, we iteratively apply $F$ reduced modulo $N$. Whenever we find a
  48. period, if $\gcd(x_i - x_j, N) > 1$ then we found a non-trivial
  49. factor of $N$.
  50. \paragraph{Choosing the function} Ideally, $F$ should be easily computable, but
  51. at the same time random enough to reduce as much as possible the epacts
  52. ~\cite{Crandall} \S 5.2.1. Any quadratic function $F(x) = x^2 + b$ should be
  53. enough, provided that $b \in \naturalN \setminus \{0, 2\}$ \footnote{
  54. Note that this has been only empirically verified, and so far not been proved
  55. (~\cite{riesel}, p. 177)}.
  56. For example, ~\cite{pollardMC} uses $x^2 -1$, meanwhile we are going to choose
  57. $F(x) = x^2 + 1$.
  58. \paragraph{Finding the period} In \cite{AOCPv2} \S 3.1, Knuth gives a simple and
  59. elegant algorithm, attributed to Floyd, for finding a multiple of the
  60. period. This algorithm is the same one finally adopted by Pollard in
  61. ~\cite{pollardMC}.
  62. \begin{theorem*}
  63. Given an \emph{ultimately periodic} sequence, in the sense that there exists
  64. numbers $\lambda$ and $\mu$ for which the values:
  65. \begin{align*}
  66. X_0, X_1, \ldots, X_{\mu}, \ldots, X_{\mu + \lambda - 1}
  67. \end{align*}
  68. are distinct, but $X_{n+\lambda} = X_n$ when $n \geq \mu$,
  69. then there exists an
  70. $\mu < n < \mu + \lambda$ such that $X_n = X_{2n}$.
  71. \end{theorem*}
  72. \begin{proof}
  73. First, if $X_n = X_{2n}$, then the sequence is obviously periodic from
  74. $X_{2n}$ onward, possibly even earlier.
  75. Conversely, $X_n = X_m \quad (n \geq \mu)$ for
  76. $m = n + k\lambda, \quad k \in \naturalN$. Hence, there will eventually
  77. be an $n$ such that $X_n = X_{2n}$ if and only if $n - \mu$ is a multiple of
  78. $\lambda$.
  79. The first such value happens for $n = (\lambda + 1)\floor{\rfrac{\mu}{\lambda}}$.
  80. \end{proof}
  81. The immediate consequence of this is that we can find the period $q$ simply by
  82. checking $\gcd(x_{2i} - x_i, N)$ for incremental $i$-s.
  83. \paragraph{Brent's Improvement} In 1979, Brent discovered an entire family of
  84. cycle-finding algorithms whose optimal version resulted to be 36\% faster than
  85. Floyd's one \cite{pollard-brent}.
  86. Instead of looking for the period of the sequence using $x_{2i} - x_i$, Brent
  87. considers
  88. $\abs{x_j - x_{2^k}}$ for $ 3 \cdot 2^{k-1} < j \leq 2^{k+1}$, resulting in
  89. fewer operations required by the algorithm. Pragmatically, this boils down to
  90. confronting:
  91. \medskip
  92. \begin{tabular}{l@{\hskip 40pt} l@{\hskip 50pt} l}
  93. $k = 0$ & $j \in \{1+1 \ldots 2\}$ & $\abs{x_1 - x_2}$ \\
  94. $k = 1$ & $j \in \{3+1 \ldots 4\}$ & $\abs{x_2 - x_4}$ \\
  95. $k = 2$ & $j \in \{6+1 \ldots 8\}$ & $\abs{x_4 - x_7}$, $\abs{x_4 - x_8}$ \\
  96. $k = 3$ & $j \in \{12+1, \ldots 16\}$ &
  97. $\abs{x_8 - x_{13}}$, $\abs{x_8 -x_{14}}$, $\ldots$, $|x_8 - x_{16}|$\\
  98. $k = 4$ & $j \in \{24+1, \ldots 32 \}$ &
  99. $\abs{x_{16} - x_{25}}$, $\ldots$, $\abs{x_{16} - x_{32}}$ \\[2pt]
  100. $\quad \vdots$ & $\quad \vdots$ & $\quad \quad \vdots$ \\
  101. %\multicolumn{1}{c}{$\vdots$} &
  102. %\multicolumn{1}{c}{$\vdots$} &
  103. %\multicolumn{1}{c}{$\vdots$} \\
  104. \end{tabular}
  105. A Pollard's $\rho$ variant that implements Brent's cycle-finding algorithm
  106. instead of Floyd's one runs around 25\% faster on average
  107. ~\cite{pollard-brent}.
  108. \section{Complexity}
  109. \cite{riesel} presents a nice demonstration of the \emph{average} complexity of
  110. this algorithm, based on the birthday paradox.
  111. \newtheorem*{birthday}{The Birthday Paradox}
  112. \begin{birthday}
  113. How many persons needs to be selected at random in order that the probability
  114. of at least two of them having the same birthday exceeds $\rfrac{1}{2}$?
  115. \end{birthday}
  116. \begin{proof}[Solution]
  117. The probability that $\epsilon$ different persons have different birthdays is:
  118. \begin{align*}
  119. \Big(1 - \frac{1}{365}\Big)
  120. \Big(1 - \frac{2}{365}\Big)
  121. \Big(1 - \frac{3}{365}\Big)
  122. \cdots
  123. \Big(1 - \frac{\epsilon -1}{365}\Big)
  124. =
  125. \frac{365!}{365^\epsilon (365-\epsilon)!}
  126. \end{align*}
  127. This expression becomes $< \rfrac{1}{2}$ for $\epsilon \geq 23$.
  128. \end{proof}
  129. We can obviously substitute the $365$ with any set cardinality $\zeta$
  130. to express the probability that a random function from $\integerZ_{|\epsilon}$
  131. to $\integerZ_{|\zeta}$ is injective. Back to our particular case,
  132. we want to answer the question:
  133. \emph{
  134. How many random numbers do we have to run through before finding at least
  135. two integers equivalent $\mod{q}$?
  136. }
  137. Using the same reasoning presented above over the previously defined function
  138. $f(x): \mathcal{S} \to \mathcal{S}$, we will discover that after
  139. $\approx 1.18 \sqrt{q}$ steps the probability to have fallen inside the period
  140. is $\rfrac{1}{2}$. %% is it clear that q is either one of the two primes, and
  141. %% that here we want to examinate only a portion of the
  142. %% domain?
  143. Since any of the two primes factoring $N$ is bounded above by $\sqrt{N}$, we
  144. will find a periodic sequence, and thus a factor, in time \bigO{\sqrt[4]{N}}.
  145. \section{A Computer program for Pollard's $\rho$ method}
  146. The initial algorithm described by Pollard \cite{pollardMC} and consultable
  147. immediately below, looks for the pair $\angular{x_i, x_{2i}}$ such that
  148. $\gcd(x_{2i} - x_i, N) > 1$. This is achieved by keeping two variables $x, y$
  149. and respectively updating them via $x \gets f(x)$ and $y \gets f(f(y))$.
  150. \begin{algorithm}
  151. \caption{Pollard's $\rho$ factorization}
  152. \begin{algorithmic}[1]
  153. \State $x \getsRandom \naturalN$
  154. \State $y \gets x$
  155. \State $g \gets 1$
  156. \While{$g = 1$}
  157. \State $x \gets x^2 + 1 \pmod{N}$
  158. \State $y \gets y^4 + y^2 \ll 1 + 2 \pmod{N}$
  159. \State $g \gets gcd(x, y)$
  160. \EndWhile
  161. \Return $g$
  162. \end{algorithmic}
  163. \end{algorithm}
  164. \begin{remark}
  165. It is intresting to see how in its basic version, Pollard's $\rho$
  166. method just needs 3 variables are to preserve the
  167. state. This places it among the most parsimonious factorization algorithms in
  168. terms of memory footprint.
  169. \end{remark}
  170. An immediate improvement of this algorithm would be to occasionally compute Euclid's
  171. algorithm over the accumulated product to save some computation cycles, just as
  172. we saw in section~\ref{sec:pollard-1:implementing}. The next code fragment
  173. adopts this trick together with Brent's cycle-finding variant:
  174. \begin{algorithm}
  175. \caption{Pollard-Brent's factorization \label{alg:pollardrho}}
  176. \begin{algorithmic}[1]
  177. \State $s \gets 100$
  178. \Comment steps to check for $\gcd$
  179. \State $i \gets 1; \quad j' \gets j \gets 1$
  180. \Comment indices for our $x$s
  181. \State $x' \gets x \getsRandom \naturalN$
  182. \Comment The $x_i$ discussed above
  183. \State $y' \gets y \getsRandom x^2 + 1$
  184. \Comment The $x_j$ discussed above
  185. \State $k \gets 0; \quad q \gets \abs{x-y}$
  186. \While{$g = 1$}
  187. \State $x \gets y$ \Comment $x_i = 2^k$
  188. \State $j \gets 3 \cdot 2^{(k++)} + 1$
  189. \While{$j++ \leq 2^k$}
  190. \State $y \gets y^2 + 1 \pmod{N}$
  191. \State $q \gets q \cdot \abs{x - y}$
  192. \If{$i++ \mid s$} \Comment Time to compute $\gcd$?
  193. \State $g \gets \gcd(q, N)$
  194. \If{$g > 1$} \Return g \EndIf
  195. \If{$g = 0$}
  196. \Comment Too far: fall back to latest epoch
  197. \State $s \gets 1; \quad g \gets 1$
  198. \State $j \gets j'; \quad x \gets x'; \quad y \gets y'$
  199. \Else
  200. \Comment Save current state
  201. \State $x' \gets x; \quad y' \gets y$
  202. \State $j' \gets j$
  203. \EndIf
  204. \EndIf
  205. \EndWhile
  206. \EndWhile
  207. \end{algorithmic}
  208. \end{algorithm}
  209. %%% Local Variables:
  210. %%% mode: latex
  211. %%% TeX-master: "question_authority"
  212. %%% End: