123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269 |
- \chapter{Pollard's $\rho$ factorization method \label{chap:pollardrho}}
- A \emph{Monte Carlo} factorization method, published by J. M. Pollard in
- ~\cite{pollardMC}, consists into identifying a periodically recurrent sequence
- of integers within a random walk $\pmod{N}$ that could leak one of the two
- factors.
- Consider a function $f$ from $\mathcal{S}$ to $\mathcal{S}$, where
- $\mathcal{S} = \{0, 1, \ldots, q-1\}$ and $q \in \naturalPrime$.
- Let $s$ be a random element in $\mathcal{S}$, and consider the sequence
- \begin{align*}
- s,\ f(s),\ f(f(s)),\ \ldots
- \end{align*}
- Since $f$ acts over a finite set, it is clear that this sequence must
- eventually repeat, and become cyclic.
- We might diagram it with the letter $\rho$, where the tail represent the
- aperiodic part, or \emph{epacts}, and the oval the cyclic part, or
- \emph{period}.
- \begin{center}
- \begin{tikzpicture}[scale=0.7, thick]
- \tikzstyle{every node}=[draw,circle,fill=white,minimum size=4pt,
- inner sep=0pt]
- \node (1) at (1.4, 0.2) [label=left:$x_1$] {};
- \node (2) at (2.5, 3) [label=left:$x_{i-2}$] {};
- \node (3) at (3.25, 5) [label=left:$x_{i-1}$] {};
- \node (4) at (4, 7) [label=left:$ x_i \equiv x_j $] {};
- \node (5) at (6, 9) [label=above:$x_{i+1}$] {};
- \node (6) at (8, 7) [label=right:$x_{i+2}$] {};
- \node (7) at (6, 5) [label=below:$x_{j-1}$] {};
- \path (1) edge [dashed] (2);
- \path (2) edge (3);
- \path (3) edge (4);
- \path (4) edge [bend left] (5);
- \path (5) edge [bend left] (6);
- \path (6) edge [bend left, dashed] (7);
- \path (7) edge [bend left] (4);
- %%\draw [decorate,decoration={brace, raise=1.5cm}] (1) -- (3)
- %%node[draw=no] at (-1.5, 4) {tail};
- \draw [decorate,decoration={brace, raise=3cm}] (5) -- (7)
- node[draw=none] at (13, 7) {\footnotesize {period $\pmod{q}$}};
- \end{tikzpicture}
- \end{center}
- Now, consider $N=pq$.
- Let $F(x)$ be any function generating pseudorandom integers
- $\angular{x_1, x_2, \ldots}$, and let $f(x) = F(x) \pmod{q}$.
- As we said above, without any luck, there will be a pair $\angular{x_i, x_j}$
- generated by $F$ such that $x_i \equiv x_j \pmod{q}$, but $x_i \neq x_j$.
- Therefore, in order to factorize $N$, we proceed as follows: starting from a
- random $s$, we iteratively apply $F$ reduced modulo $N$. Whenever we find a
- period, if $\gcd(x_i - x_j, N) > 1$ then we found a non-trivial
- factor of $N$.
- \paragraph{Choosing the function} Ideally, $F$ should be easily computable, but
- at the same time random enough to reduce as much as possible the epacts
- ~\cite{Crandall} \S 5.2.1. Any quadratic function $F(x) = x^2 + b$ should be
- enough\footnote{
- Note that this has been only empirically verified, and so far not been proved
- (~\cite{riesel}, p. 177)},
- provided that $b \in \naturalN \setminus \{0, 2\}$.
- For example, ~\cite{pollardMC} uses $x^2 -1$, meanwhile we are going to choose
- $F(x) = x^2 + 1$.
- \paragraph{Finding the period} The trivial way to discover a period would be to
- test $x_i$ with all $x_j, \quad j < i$. Though, in \cite{AOCPv2} \S 3.1,
- Knuth gives a simple and elegant algorithm, attributed to Floyd, for finding a
- multiple of the period.
- This algorithm is the same one finally adopted by Pollard in
- ~\cite{pollardMC}.
- \begin{theorem*}[Floyd]
- Given an \emph{ultimately periodic} sequence, in the sense that there exists
- numbers $\lambda$ and $\mu$ for which the values:
- \begin{align*}
- X_0, X_1, \ldots, X_{\mu}, \ldots, X_{\mu + \lambda - 1}
- \end{align*}
- are distinct, but $X_{n+\lambda} = X_n$ when $n \geq \mu$,
- then there exists an
- $\mu < n < \mu + \lambda$ such that $X_n = X_{2n}$.
- \end{theorem*}
- \begin{proof}
- First, if $X_n = X_{2n}$, then the sequence is obviously periodic from
- $X_{2n}$ onward, possibly even earlier.
- Conversely, it is true that $X_n = X_m \quad (n \geq \mu)$ for
- $m = n + k\lambda, \quad k \in \naturalN$. Hence, there will eventually
- be an $n$ such that $X_n = X_{2n}$ if and only if $n - \mu$ is a multiple of
- $\lambda$.
- The first such value happens for $n = (\lambda + 1)\floor{\rfrac{\mu}{\lambda}}$.
- \end{proof}
- The immediate consequence of this is that we can find a collision simply by
- checking $\gcd(x_{2i} - x_i, N) > 1$ for incremental $i$.
- \paragraph{Brent's Improvement} In 1979, Brent discovered an entire family of
- cycle-finding algorithms whose optimal version resulted to be 36\% faster than
- Floyd's one \cite{pollard-brent}.
- Instead of looking for the period of the sequence using $x_{2i} - x_i$, Brent
- considers
- $\abs{x_j - x_{2^k}}$ for $ 3 \cdot 2^{k-1} < j \leq 2^{k+1}$, resulting in
- fewer operations required by the algorithm. Pragmatically, this boils down to
- compare:
- \medskip
- \begin{tabular}{l@{\hskip 40pt} l@{\hskip 50pt} l}
- $k = 0$ & $j \in \{1+1 \ldots 2\}$ & $\abs{x_1 - x_2}$ \\
- $k = 1$ & $j \in \{3+1 \ldots 4\}$ & $\abs{x_2 - x_4}$ \\
- $k = 2$ & $j \in \{6+1 \ldots 8\}$ & $\abs{x_4 - x_7}$, $\abs{x_4 - x_8}$ \\
- $k = 3$ & $j \in \{12+1, \ldots 16\}$ &
- $\abs{x_8 - x_{13}}$, $\abs{x_8 -x_{14}}$, $\ldots$, $|x_8 - x_{16}|$\\
- $k = 4$ & $j \in \{24+1, \ldots 32 \}$ &
- $\abs{x_{16} - x_{25}}$, $\ldots$, $\abs{x_{16} - x_{32}}$ \\[2pt]
- $\quad \vdots$ & $\quad \vdots$ & $\quad \quad \vdots$ \\
- %\multicolumn{1}{c}{$\vdots$} &
- %\multicolumn{1}{c}{$\vdots$} &
- %\multicolumn{1}{c}{$\vdots$} \\
- \end{tabular}
- A Pollard's $\rho$ variant that implements Brent's cycle-finding algorithm
- instead of Floyd's one runs around 25\% faster on average
- ~\cite{pollard-brent}.
- \section{Complexity}
- \cite{riesel} presents a nice proof of the \emph{average} complexity of
- this algorithm, based on the birthday paradox.
- \newtheorem*{birthday}{The Birthday Paradox}
- \begin{birthday}
- How many persons needs to be selected at random in order that the probability
- of at least two of them having the same birthday exceeds $\rfrac{1}{2}$?
- \end{birthday}
- \begin{proof}[Solution]
- The probability that $\epsilon$ different persons have different birthdays is:
- \begin{align*}
- \Big(1 - \frac{1}{365}\Big)
- \Big(1 - \frac{2}{365}\Big)
- \Big(1 - \frac{3}{365}\Big)
- \cdots
- \Big(1 - \frac{\epsilon -1}{365}\Big)
- =
- \frac{365!}{365^\epsilon (365-\epsilon)!}
- \end{align*}
- This expression becomes $< \rfrac{1}{2}$ for $\epsilon \geq 23$.
- \end{proof}
- We can obviously substitute the $365$ with any set of cardinality $\zeta$
- to express the probability that a random function from $\integerZ_{\epsilon}$
- to $\integerZ_{\zeta}$ is injective. However, back to our particular case,
- we want to answer the question:
- \emph{
- How many random numbers do we have to run through before finding at least
- two integers equivalent $\mod{q}$?
- }
- Using the same reasoning presented above over the previously defined function
- $f(x): \mathcal{S} \to \mathcal{S}$, we will discover that after
- $\approx 1.18 \sqrt{q}$ steps the probability to have fallen inside the period
- is $\rfrac{1}{2}$. %% is it clear that q is either one of the two primes, and
- %% that here we want to examinate only a portion of the
- %% domain?
- Since any of the two primes factoring $N$ is bounded above by $\sqrt{N}$, we
- will find a periodic sequence, and thus a factor, in time \bigO{\sqrt[4]{N}}.
- \section{An Implementation Perspective}
- The initial algorithm described by Pollard \cite{pollardMC} and consultable
- immediately below, looks for the pair $\angular{x_i, x_{2i}}$ such that
- $\gcd(x_{2i} - x_i, N) > 1$. This is achieved by keeping two variables $x, y$
- and respectively updating them via $x \gets f(x)$ and $y \gets f(f(y))$.
- \begin{algorithm}
- \caption{Pollard's $\rho$ factorization}
- \begin{algorithmic}[1]
- \Function{rho}{\PKArg}
- \State $x \getsRandom \naturalN$
- \State $y \gets x$
- \State $g \gets 1$
- \While{$g = 1$}
- \State $x \gets x^2 + 1 \pmod{N}$
- \State $y \gets y^4 + 2y^2 + 2 \pmod{N}$
- \State $g \gets gcd(|x - y|, N)$
- \EndWhile
- \If{$g = N$} \Return \strong{nil}
- \Else \ \ \Return $g, N//g$
- \EndIf
- \EndFunction
- \end{algorithmic}
- \end{algorithm}
- \begin{remark}
- It is intresting to see how in its basic version, Pollard's $\rho$
- method just needs 3 variables to preserve the
- state. This places it among the most parsimonious factorization algorithms in
- terms of memory footprint.
- \end{remark}
- An immediate improvement of this algorithm would be to occasionally compute Euclid's
- algorithm over the accumulated product to save some computation cycles, just as
- we saw in section~\ref{sec:pollard-1:implementing}. The next code fragment - algorithm
- \ref{alg:pollardrho} - adopts this trick together with Brent's cycle-finding variant
- (\cite{pollard-brent}\S 7).
- \paragraph{Parallelism}
- Unfortunately, a parallel implementation of the $\rho$ algorithm would not
- provide a linear speedup.
- The computation of the $x_i$ sequence is intrinsecally serial; the only
- plausible approach to parallelism would be to try several different pseudorandom
- sequences, in which case $m$ different machines processing $m$ different
- sequences in parallel would be no more than \bigO{\sqrt{m}}
- efficient (\cite{brent:parallel} \S 3).
- \begin{algorithm}
- \caption{Pollard-Brent's factorization
- \label{alg:pollardrho}}
- \begin{algorithmic}[1]
- \Function{rho}{\PKArg}
- \State $r \gets 1$
- \State $q \gets 1$
- \Comment the accumulated $\gcd$
- \State $g \gets 1$
- \State $m \gets 100$
- \Comment steps before checking for $\gcd$
- \State $y \getsRandom \naturalN_{< N}$
- \While{$g = 1$}
- \State $x \gets y$
- \For{$r \strong{ times }$}
- \State $y \gets y^2 + 1 \pmod{N}$
- \EndFor
- \State $k \gets 0$
- \While{$k \leq r \strong{ and } g = 1$}
- \State $ys \gets y$
- \Comment backup state
- \For{$\min\{m, r-k\} \strong{ times }$}
- \Comment accumulate values to test later
- \State $y \gets y^2 + 1 \pmod{N}$
- \State $q \gets q \cdot \abs{x -y} \pmod{N}$
- \EndFor
- \State $k \gets k + m$
- \State $g \gets \gcd(q, N)$
- \EndWhile
- \State $r \gets r \ll 1$
- \EndWhile
- \If{$g = N$}
- \Comment too far; fall back to latest epoch
- \Repeat
- \State $ys \gets ys^2 + 1 \pmod{N}$
- \State $g \gets \gcd(N, \abs{x -ys})$
- \Until{$g > 1$} \EndIf
- \If{$g = 1$} \Return \strong{nil}
- \Else \ \ \Return $g, N//g$
- \EndIf
- \EndFunction
- \end{algorithmic}
- \end{algorithm}
- %%% Local Variables:
- %%% mode: latex
- %%% TeX-master: "question_authority"
- %%% End:
|