\chapter{Dixon's factorization method\label{chap:dixon}}

~\cite{dixon} describes a class of ``probabilistic algorithms'' for finding a
factor of any composite number, at a sub-exponential cost. They basically
consists into taking random integers $r$ in $\{1, \ldots, N\}$ and look for those
where $r^2 \mod{N}$ is \emph{smooth}. If enough are found, then those integers
can somehow be assembled, and so a fatorization of N attemped.

%% that's not really academic to be stated officially, but I would have never
%% understood this section without Firas (thanks).
%% <http://blog.fkraiem.org/2013/12/08/factoring-integers-dixons-algorithm/>
%% I kept the voila` phrase, that was so lovely.
\section{Interlude \label{sec:dixon:history}}
During the latest century there has been a huge effort to approach the problem
formulated by Fermat ~\ref{eq:fermat_problem} from different perspecives. This
led to an entire family of algorithms, like \emph{Quadratic Sieve},
\emph{Dixon}, \ldots.

The core idea is still to find a pair of perfect squares whose difference can
factorize $N$, but maybe Fermat's hypotesis can be made weaker.

\paragraph{Kraitchick} was the first one popularizing the idea that instead of
looking for integers $\angular{x, y}$ such that $x^2 -y^2 = N$ it is sufficient
to look for \emph{multiples} of $N$:
\begin{align}
  x^2 - y^2 \equiv 0 \pmod{N}
\end{align}
and, once found, claim that $\gcd(N, x \pm y)$ are non-trial divisors of $N$
just as we did in \ref{sec:fermat:implementation}.
Kraitchick did not stop here: instead of trying $x^2 \equiv y^2 \pmod{N}$ he
kept the value of previous attempt, and tries to find \emph{a product} of such
values which is also a square. So we have a sequence
\begin{align}
  \label{eq:dixon:x_sequence}
  \angular{x_0, \ldots, x_k} \mid \forall i \leq k \quad x_i^2 - N
  \; \text{ is a perfect square}
\end{align}
and hence
\begin{align*}
  \prod_i (x_i^2 - N) = y^2
\end{align*}
that $\mod{N}$ is equivalent to:
\begin{align}
  \label{eq:dixon:fermat_revisited}
  y^2 \equiv \prod_i (x_i^2 - N) \equiv \big( \prod_i x_i \big) ^2 \pmod{N}
\end{align}
and voil\`a our congruence of squares (\cite{discretelogs} \S 4). For what
concerns the generation of $x_i$ with the property \ref{eq:dixon:x_sequence},
they can simply be taken at random and tested using trial division.

\paragraph{Brillhart and Morrison} later proposed (\cite{morrison-brillhart}
p.187) a better approach than trial division to find such $x$. Their idea aims
to ease the enormous effort required by the trial division. In order to achieve
this. they introduce a \emph{factor base} $\factorBase$ and generate random $x$
such that $x^2 - N$ is $\factorBase$-smooth. Recalling what we anticipated in
~\ref{chap:preq}, $\factorBase$ is a precomputed set of primes
$p_i \in \naturalPrime$.
This way the complexity of generating a new $x$ is dominated by
\bigO{|\factorBase|}. Now that the right side of \ref{eq:dixon:fermat_revisited}
has been satisfied, we have to select a subset of those $x$ so that their
product can be seen as a square. Consider an \emph{exponent vector}
$v_i = (\alpha_0, \alpha_1, \ldots, \alpha_{r-1})$ with $r = |\factorBase| + 1$
associated with each $x_i$, where
\begin{align}
  \label{eq:dixon:alphas}
  \alpha_j = \begin{cases}
    1 \quad \text{if $p_j$ divides $x_i$ to an odd power} \\
    0 \quad \text{otherwise}
  \end{cases}
\end{align}
for each $1 \leq j < r $. There is no need to restrict ourselves for positive
values of $x^2 -N$, so we are going to use $\alpha_0$ to indicate the sign -$1$
if negative, $0$ otherwise.
This benefit has a neglegible cost: we have to add the non-prime $-1$ to our
factor base $\factorBase$.

Let now $M \in \mathbb{F}_2^{(f \times r)}$,
for some $f > r$,
be the rectangular matrix having per each $i$-th row the
$v_i$ associated to $x_i$: this way each matrix element $m_{ij}$ will be the
$j$-th component of $v_i$.
We are interested in finding set(s) of the subsequences of $x_i$
whose product always have even powers (\ref{eq:dixon:fermat_revisited}).
Turns out that this is equivalent to look for the set of vectors
$\{ w \mid wM = 0 \} = \ker(M)$ by definition of matrix multiplication
in $\mathbb{F}_2$.


\paragraph{Dixon} Morrison and Brillhart's ideas of \cite{morrison-brillhart}
were actually used for a slightly different factorization method, employing
continued fractions instead of the square difference polynomial. Dixon simply
ported these to the square problem, achieving a probabilistic factorization
method working at a computational cost asymptotically better than all other ones
previously described: \bigO{\exp \{\beta(\log N \log \log N )^{\rfrac{1}{2}}\}}
for some constant $\beta > 0$ \cite{dixon}.

\section{Breaching the kernel}

The following reduction procedure, extracted from ~\cite{morrison-brillhart}, is
a forward part of the Gauss-Jordan elimination algorithm (carried out from right
to left), and can be used to determine whether the set of exponent vectors is
linearly dependent.

For each $v_i$ described as above, associate a \emph{companion history vector} \\
$h_i = (\beta_0, \beta_1, \ldots, \beta_{f-1})$, where for $0 \leq m < f$:
\begin{align*}
  \beta_m = \begin{cases}
    1 \quad \text{ if $m = i$} \\
    0 \quad \text{ otherwise}
    \end{cases}
\end{align*}
At this point, we have all data structures needed:
\\
\\
\\

\begin{center}
  \emph{Reduction Procedure}
\end{center}
\begin{enumerate}[(i)]
  \item Set $j=r-1$;
  \item find the ``pivot vector'', i.e. the first vector
    $v_i, \quad 0 \leq i < f$ such that $\alpha_j = 1$. If none is found, go
    to (iv);
  \item
    \begin{enumerate}[(a)]
      \item replace every following vector $v_m, \quad i < m < f$
        whose rightmost $1$ is the $j$-th component, by the sum $v_i \xor v_m$;
      \item whenever $v_m$ is replaced by $v_i \xor v_m$, replace also the
        associated history vector $h_m$ with $h_i \xor h_m$;
    \end{enumerate}
  \item Reduce $j$ by $1$. If $j \geq 0$, return to (ii); otherwise stop.
\end{enumerate}

Algorithm \ref{alg:dixon:kernel} formalizes concepts so far discussed, by
presenting a function \texttt{ker}, discovering linear dependencies in any
rectangular matrix $M \in \mathbb{F}_2^{(f \times r)}$
and storing dependencies into a \emph{history matrix} $H$.

\begin{remark}
  We are proceeding from right to left in order to conform with
  \cite{morrison-brillhart}.
  Instead, their choice lays on optimization reasons, which does
  not apply any more to a modern calculator.
\end{remark}

\begin{algorithm}
  \caption{Reduction Procedure  \label{alg:dixon:kernel}}
  \begin{algorithmic}[1]
    \Function{Ker}{$M$}
    \State $H \gets \texttt{Id}(f \times f)$
    \Comment the initial $H$ is the identity matrix

    \For{$j = r-1 \strong{ downto } 0$}
    \Comment reduce
      \For{$i=0 \strong{ to } f-1$}
        \If{$M_{i, j} = 1$}
          \For{$i' = i+1 \strong{ to } f-1$}
            \If{$M_{i', k} = 1$}
              \State $M_{i'} = M_i \xor M_{i'}$
              \State $H_{i'} = H_i \xor H_{i'}$
            \EndIf
          \EndFor
          \State \strong{break}
        \EndIf
      \EndFor
    \EndFor

    \For{$i = 0 \strong{ to } f-1$}
    \Comment yield  linear dependencies
      \If{$M_i = (0, \ldots, 0)$}
        \strong{yield} $\{\mu  \mid H_{i,\mu} = 1\}$
      \EndIf
    \EndFor
    \EndFunction
  \end{algorithmic}
\end{algorithm}

\begin{remark}
The \texttt{yield} statement in line $12$ of algorithm \ref{alg:dixon:kernel}
has the same semantics as in the python programming language.
It is intended to underline the fact that each $\{\mu  \mid H_{i,\mu} = 1\}$
can lead to a solution for \ref{eq:dixon:x_sequence}, and therefore their
generation can be performed asynchronously.
\end{remark}


\section{An Implementation Perspective}

Before gluing all toghether, we need one last building brick necessary for
Dixon's factorization algorithm: a \texttt{smooth}($x$) function. In our
specific case, we need a function that, given as input a number $x$, returns
\strong{nil} if $x^2 -N$ is not $\factorBase$-smooth. Otherwise,
returns a vector $v = (\alpha_0, \ldots, \alpha_r)$ such that each $\alpha_j$ is
defined just as in \ref{eq:dixon:alphas}. Once we have established $\factorBase$, its
implementation comes straightfoward.

\paragraph{How do we choose $\factorBase$?}
It's not easy to answer: if we choose $\factorBase$ small, we will rarely find
$x^2 -N$ \emph{smooth}. If we chose it large, attempting to factorize $x^2 -N$
with $\factorBase$ will pay the price of iterating through a large set.
\cite{Crandall} \S 6.1 finds a solution for this problem by employing complex
analytic number theory.
 As a  result, the ideal value for $|\factorBase|$ is
$e^{\sqrt{\ln N \ln \ln N}}$.


\begin{algorithm}
  \caption{Discovering Smoothness}
  \begin{algorithmic}[1]
    \Require $\factorBase$, the factor base
    \Function{smooth}{$x$}
      \State $v \gets (\alpha_0 = 0, \ldots, \alpha_{|\factorBase|} = 0)$

      \If{$x < 0$} $\alpha_0 \gets 1$ \EndIf
      \For{$i = 1 \strong{ to } |\factorBase|$}
        \While{$\factorBase_i \mid x$}
          \State $x \gets x// \factorBase_i$
          \State $\alpha_i \gets \alpha_i \xor 1$
        \EndWhile
      \EndFor
      \If{$x = 1$}
        \State \Return $v$
      \Else
        \State \Return \strong{nil}
      \EndIf
    \EndFunction
  \end{algorithmic}
\end{algorithm}

\begin{algorithm}
  \caption{Dixon}
  \begin{algorithmic}[1]
    \Require $\factorBase$, the factor base
    \Function{dixon}{\PKArg}
    \State $i \gets 0$
    \State $f \getsRandom \naturalN_{ > |\factorBase|}$
    \Comment finding linearity requires redundance
    \While{$i < f$}
    \Comment search for suitable pairs
    \State $x_i \getsRandom \naturalN_{< N}$
    \State $y_i \gets x_i^2 - N$
    \State $v_i \gets \textsc{smooth}(y_i)$
    \If{$v_i \neq \strong{nil} $} $i \gets i+1$ \EndIf
  \EndWhile
  \State $M \gets \texttt{matrix}(v_0, \ldots, v_{f-1})$
  \For{$\lambda = \{\mu_0, \ldots, \mu_k\}
    \strong{ in } \textsc{ker}(M)$}
    \Comment get relations
    \State $x \gets \prod_{\mu \in \lambda} x_\mu \pmod{N}$
    \State $y, r \gets \dsqrt{\prod_{\mu \in \lambda} y_\mu \pmod{N}}$
    \State $g  \gets \gcd(x+y, N)$
    \If{$1 < g < N$}
      \State $p \gets g $
      \State $q \gets N//p$
      \State \Return $p, q$
    \EndIf
  \EndFor
  \EndFunction
  \end{algorithmic}
\end{algorithm}

\paragraph{Parallelism}

Dixon's factorization is ideally suited to parallel implementation. Similarly to
other methods like ECM and MPQS, treated in \cite{brent:parallel} \S 6.1,
we can \emph{linearly} improve the running time by distributing across many
nodes the discovery of $\factorBase$-smooth numbers.

Depending on the granularity we desire - and the number of nodes available, we
can even act on the \texttt{ker} function - but less easily.
This idea would boil down to the same structure we discussed with Wiener's attack:
one node - the \emph{producer} - discovers linear dependencies, while the others
- the \emph{consumers} - attempt to factorize $N$.

Certainly, due to the probabilistic nature of this algorithm, we can even think
about running multiple instances of the same program. This solution is fairly
effective in proportion to the development cost.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "question_authority"
%%% End: