\chapter{Pollard's $\rho$ factorization method \label{chap:pollardrho}} A \emph{Monte Carlo} factorization method, published by J. M. Pollard in ~\cite{pollardMC}, consists into identifying a periodically recurrent sequence of integers within a random walk $\pmod{N}$ that could leak one of the two factors. Consider a function $f$ from $\mathcal{S}$ to $\mathcal{S}$, where $\mathcal{S} = \{0, 1, \ldots, q-1\}$ and $q \in \naturalPrime$. Let $s$ be a random element in $\mathcal{S}$, and consider the sequence \begin{align*} s,\ f(s),\ f(f(s)),\ \ldots \end{align*} Since $f$ acts over a finite set, it is clear that this sequence must eventually repeat, and become cyclic. We might diagram it with the letter $\rho$, where the tail represent the aperiodic part, or \emph{epacts}, and the oval the cyclic part, or \emph{period}. \begin{center} \begin{tikzpicture}[scale=0.7, thick] \tikzstyle{every node}=[draw,circle,fill=white,minimum size=4pt, inner sep=0pt] \node (1) at (1.4, 0.2) [label=left:$x_1$] {}; \node (2) at (2.5, 3) [label=left:$x_{i-2}$] {}; \node (3) at (3.25, 5) [label=left:$x_{i-1}$] {}; \node (4) at (4, 7) [label=left:$ x_i \equiv x_j $] {}; \node (5) at (6, 9) [label=above:$x_{i+1}$] {}; \node (6) at (8, 7) [label=right:$x_{i+2}$] {}; \node (7) at (6, 5) [label=below:$x_{j-1}$] {}; \path (1) edge [dashed] (2); \path (2) edge (3); \path (3) edge (4); \path (4) edge [bend left] (5); \path (5) edge [bend left] (6); \path (6) edge [bend left, dashed] (7); \path (7) edge [bend left] (4); %%\draw [decorate,decoration={brace, raise=1.5cm}] (1) -- (3) %%node[draw=no] at (-1.5, 4) {tail}; \draw [decorate,decoration={brace, raise=3cm}] (5) -- (7) node[draw=none] at (13, 7) {\footnotesize {period $\pmod{q}$}}; \end{tikzpicture} \end{center} Now, consider $N=pq$. Let $F(x)$ be any function generating pseudorandom integers $\angular{x_1, x_2, \ldots}$, and let $f(x) = F(x) \pmod{q}$. As we said above, without any luck, there will be a pair $\angular{x_i, x_j}$ generated by $F$ such that $x_i \equiv x_j \pmod{q}$, but $x_i \neq x_j$. Therefore, in order to factorize $N$, we proceed as follows: starting from a random $s$, we iteratively apply $F$ reduced modulo $N$. Whenever we find a period, if $\gcd(x_i - x_j, N) > 1$ then we found a non-trivial factor of $N$. \paragraph{Choosing the function} Ideally, $F$ should be easily computable, but at the same time random enough to reduce as much as possible the epacts ~\cite{Crandall} \S 5.2.1. Any quadratic function $F(x) = x^2 + b$ should be enough\footnote{ Note that this has been only empirically verified, and so far not been proved (~\cite{riesel}, p. 177)}, provided that $b \in \naturalN \setminus \{0, 2\}$. For example, ~\cite{pollardMC} uses $x^2 -1$, meanwhile we are going to choose $F(x) = x^2 + 1$. \paragraph{Finding the period} The trivial way to discover a period would be to test $x_i$ with all $x_j, \quad j < i$. Though, in \cite{AOCPv2} \S 3.1, Knuth gives a simple and elegant algorithm, attributed to Floyd, for finding a multiple of the period. This algorithm is the same one finally adopted by Pollard in ~\cite{pollardMC}. \begin{theorem*}[Floyd] Given an \emph{ultimately periodic} sequence, in the sense that there exists numbers $\lambda$ and $\mu$ for which the values: \begin{align*} X_0, X_1, \ldots, X_{\mu}, \ldots, X_{\mu + \lambda - 1} \end{align*} are distinct, but $X_{n+\lambda} = X_n$ when $n \geq \mu$, then there exists an $\mu < n < \mu + \lambda$ such that $X_n = X_{2n}$. \end{theorem*} \begin{proof} First, if $X_n = X_{2n}$, then the sequence is obviously periodic from $X_{2n}$ onward, possibly even earlier. Conversely, it is true that $X_n = X_m \quad (n \geq \mu)$ for $m = n + k\lambda, \quad k \in \naturalN$. Hence, there will eventually be an $n$ such that $X_n = X_{2n}$ if and only if $n - \mu$ is a multiple of $\lambda$. The first such value happens for $n = (\lambda + 1)\floor{\rfrac{\mu}{\lambda}}$. \end{proof} The immediate consequence of this is that we can find a collision simply by checking $\gcd(x_{2i} - x_i, N) > 1$ for incremental $i$. \paragraph{Brent's Improvement} In 1979, Brent discovered an entire family of cycle-finding algorithms whose optimal version resulted to be 36\% faster than Floyd's one \cite{pollard-brent}. Instead of looking for the period of the sequence using $x_{2i} - x_i$, Brent considers $\abs{x_j - x_{2^k}}$ for $ 3 \cdot 2^{k-1} < j \leq 2^{k+1}$, resulting in fewer operations required by the algorithm. Pragmatically, this boils down to compare: \medskip \begin{tabular}{l@{\hskip 40pt} l@{\hskip 50pt} l} $k = 0$ & $j \in \{1+1 \ldots 2\}$ & $\abs{x_1 - x_2}$ \\ $k = 1$ & $j \in \{3+1 \ldots 4\}$ & $\abs{x_2 - x_4}$ \\ $k = 2$ & $j \in \{6+1 \ldots 8\}$ & $\abs{x_4 - x_7}$, $\abs{x_4 - x_8}$ \\ $k = 3$ & $j \in \{12+1, \ldots 16\}$ & $\abs{x_8 - x_{13}}$, $\abs{x_8 -x_{14}}$, $\ldots$, $|x_8 - x_{16}|$\\ $k = 4$ & $j \in \{24+1, \ldots 32 \}$ & $\abs{x_{16} - x_{25}}$, $\ldots$, $\abs{x_{16} - x_{32}}$ \\[2pt] $\quad \vdots$ & $\quad \vdots$ & $\quad \quad \vdots$ \\ %\multicolumn{1}{c}{$\vdots$} & %\multicolumn{1}{c}{$\vdots$} & %\multicolumn{1}{c}{$\vdots$} \\ \end{tabular} A Pollard's $\rho$ variant that implements Brent's cycle-finding algorithm instead of Floyd's one runs around 25\% faster on average ~\cite{pollard-brent}. \section{Complexity} \cite{riesel} presents a nice proof of the \emph{average} complexity of this algorithm, based on the birthday paradox. \newtheorem*{birthday}{The Birthday Paradox} \begin{birthday} How many persons needs to be selected at random in order that the probability of at least two of them having the same birthday exceeds $\rfrac{1}{2}$? \end{birthday} \begin{proof}[Solution] The probability that $\epsilon$ different persons have different birthdays is: \begin{align*} \Big(1 - \frac{1}{365}\Big) \Big(1 - \frac{2}{365}\Big) \Big(1 - \frac{3}{365}\Big) \cdots \Big(1 - \frac{\epsilon -1}{365}\Big) = \frac{365!}{365^\epsilon (365-\epsilon)!} \end{align*} This expression becomes $< \rfrac{1}{2}$ for $\epsilon \geq 23$. \end{proof} We can obviously substitute the $365$ with any set of cardinality $\zeta$ to express the probability that a random function from $\integerZ_{\epsilon}$ to $\integerZ_{\zeta}$ is injective. However, back to our particular case, we want to answer the question: \emph{ How many random numbers do we have to run through before finding at least two integers equivalent $\mod{q}$? } Using the same reasoning presented above over the previously defined function $f(x): \mathcal{S} \to \mathcal{S}$, we will discover that after $\approx 1.18 \sqrt{q}$ steps the probability to have fallen inside the period is $\rfrac{1}{2}$. %% is it clear that q is either one of the two primes, and %% that here we want to examinate only a portion of the %% domain? Since any of the two primes factoring $N$ is bounded above by $\sqrt{N}$, we will find a periodic sequence, and thus a factor, in time \bigO{\sqrt[4]{N}}. \section{An Implementation Perspective} The initial algorithm described by Pollard \cite{pollardMC} and consultable immediately below, looks for the pair $\angular{x_i, x_{2i}}$ such that $\gcd(x_{2i} - x_i, N) > 1$. This is achieved by keeping two variables $x, y$ and respectively updating them via $x \gets f(x)$ and $y \gets f(f(y))$. \begin{algorithm} \caption{Pollard's $\rho$ factorization} \begin{algorithmic}[1] \Function{rho}{\PKArg} \State $x \getsRandom \naturalN$ \State $y \gets x$ \State $g \gets 1$ \While{$g = 1$} \State $x \gets x^2 + 1 \pmod{N}$ \State $y \gets y^4 + 2y^2 + 2 \pmod{N}$ \State $g \gets gcd(|x - y|, N)$ \EndWhile \If{$g = N$} \Return \strong{nil} \Else \ \ \Return $g, N//g$ \EndIf \EndFunction \end{algorithmic} \end{algorithm} \begin{remark} It is intresting to see how in its basic version, Pollard's $\rho$ method just needs 3 variables to preserve the state. This places it among the most parsimonious factorization algorithms in terms of memory footprint. \end{remark} An immediate improvement of this algorithm would be to occasionally compute Euclid's algorithm over the accumulated product to save some computation cycles, just as we saw in section~\ref{sec:pollard-1:implementing}. The next code fragment - algorithm \ref{alg:pollardrho} - adopts this trick together with Brent's cycle-finding variant (\cite{pollard-brent}\S 7). \paragraph{Parallelism} Unfortunately, a parallel implementation of the $\rho$ algorithm would not provide a linear speedup. The computation of the $x_i$ sequence is intrinsecally serial; the only plausible approach to parallelism would be to try several different pseudorandom sequences, in which case $m$ different machines processing $m$ different sequences in parallel would be no more than \bigO{\sqrt{m}} efficient (\cite{brent:parallel} \S 3). \begin{algorithm} \caption{Pollard-Brent's factorization \label{alg:pollardrho}} \begin{algorithmic}[1] \Function{rho}{\PKArg} \State $r \gets 1$ \State $q \gets 1$ \Comment the accumulated $\gcd$ \State $g \gets 1$ \State $m \gets 100$ \Comment steps before checking for $\gcd$ \State $y \getsRandom \naturalN_{< N}$ \While{$g = 1$} \State $x \gets y$ \For{$r \strong{ times }$} \State $y \gets y^2 + 1 \pmod{N}$ \EndFor \State $k \gets 0$ \While{$k \leq r \strong{ and } g = 1$} \State $ys \gets y$ \Comment backup state \For{$\min\{m, r-k\} \strong{ times }$} \Comment accumulate values to test later \State $y \gets y^2 + 1 \pmod{N}$ \State $q \gets q \cdot \abs{x -y} \pmod{N}$ \EndFor \State $k \gets k + m$ \State $g \gets \gcd(q, N)$ \EndWhile \State $r \gets r \ll 1$ \EndWhile \If{$g = N$} \Comment too far; fall back to latest epoch \Repeat \State $ys \gets ys^2 + 1 \pmod{N}$ \State $g \gets \gcd(N, \abs{x -ys})$ \Until{$g > 1$} \EndIf \If{$g = 1$} \Return \strong{nil} \Else \ \ \Return $g, N//g$ \EndIf \EndFunction \end{algorithmic} \end{algorithm} %%% Local Variables: %%% mode: latex %%% TeX-master: "question_authority" %%% End: