\chapter{Fermat's factorization method \label{chap:fermat}}

Excluding the trial division, Fermat's method is the oldest known systematic
method for factorizing integers. Even if its algorithmic complexity is not
among the most efficient, it holds still a practical interest whenever
the two primes are sufficiently close.
Indeed, \cite{DSS2009} \S B.3.6 explicitly recommends that
$|p-q| \geq \sqrt{N}2^{-100}$
for any key of bitlength $1024,\ 2048,\ 3072$ in order to address this kind of
threat.\\
%% it would be nice here to explain that this magic 2^100 is just about wonting
%% the most significant digits to be different.
The basic idea is to attempt to write $N$ as a difference of squares,
\begin{align}
\label{eq:fermat_problem}
x^2 - N = y^2
\end{align}

So, we start by $x = \ceil{\sqrt{N}}$ and check that $x^2-N$ is a perfect
square. If it isn't, we iteratively increment $x$ and check again, until we
find a pair $\angular{x, y}$ satisfying equation \ref{eq:fermat_problem}.
Once found, we claim that $N = pq = (x+y)(x-y)$; it is indeed true that, if we
decompose $x^2 - y^2$ as difference of squares, then it is immediately clear
that $x+y \mid N \ \land \  x-y \mid N$, and that both are non-trivial
divisors.

\paragraph{Complexity} \cite{riesel} contains a detailed proof for the
complexity of this algorithm, which is
$\bigO{\frac{(1-k)^2}{2k} \sqrt{N}} \;\;,  0 < k < 1$. We summarize it down
below here to better clarify the limits of this algorithm.

\begin{proof}
  Since, once we reach the final step $x_f$ it holds $N = pq = x_f^2 - y_f^2$,
  the number of steps required to reach the result is:
  \begin{align*}
    x_f - \sqrt{N} &= \frac{p + q}{2} - \sqrt{N} \\
                   &= \frac{p + \frac{N}{p}}{2} - \sqrt{N} \\
                   &= \frac{(\sqrt{N} - p)^2}{2p}
  \end{align*}
  If we finally suppose that $p = k\sqrt{N}, \; 0 < k < 1$, then the number of cycles
  becomes
  $\frac{(1-k)^2}{2k} \sqrt{N}$.
\end{proof}

\begin{remark}
  Note that, for the algorithm to be effective, the two primes must be
  ``really close'' to $\sqrt{N}$. As much as the lowest prime gets near to
  $1$, the ratio $\frac{(1-k)^2}{2k}$ becomes larger, until the actual magnitude
  of this factorization method approaches \bigO{N}.
\end{remark}

\section{An Implementation Perspective \label{sec:fermat:implementation}}

At each iteration, the $i-$th state is hold by the pair $\angular{x, x^2}$.\\
The later step, described by $\angular{x+1, (x+1)^2}$ can be computed efficiently
considering the square of a binomial: $\angular{x+1, x^2 + 2x + 1}$.
The upper-bound, instead, is reached when
$ \Delta = p - q  = x + y - x + y = 2y > 2^{-100}\sqrt{N}$.

Algorithm ~\ref{alg:fermat} presents a simple implementation of this
factorization method, taking into account the small optimizations
aforementioned.

\begin{algorithm}[H]
  \caption{Fermat Factorization \label{alg:fermat}}
  \begin{algorithmic}[1]
    \Function{fermat}{\PKArg}
    \State $x \gets \floor{\sqrt{N}}$
    \State $x' \gets x \cdot x$

    \Repeat
    \State $x' \gets x' + 2x + 1$
    \State $x \gets x+1$
    \State $y, rest \gets \dsqrt{x' - N}$
    \Until{ $rest \neq 0 \strong{ and } y < \frac{\sqrt{N}}{2^{101}}$ }
    \Comment i.e., \ref{eq:fermat_problem} holds?

    \If{ $rest = 0$ }
    \State $p \gets x+y$
    \State $q \gets x-y$
    \State \Return $p, q$
    \Else
    \State \Return \textbf{nil}
    \EndIf
    \EndFunction
    \end{algorithmic}
\end{algorithm}

\paragraph{How to chose the upper limit?}  Our choice of keeping straight with
the limits of the standard is a mere choice of commodity: we are interested in
finding public keys  not respecting the standard.
Though, it is worth noting that what this limit \emph{states} is that at least
one of the most significant $100$ bits should be different between the two
primes:

\begin{bytefield}[
  endianness=big,
  bitwidth=1.35em,
  % bitformatting=\fakerange,
  ]{16}
  \\
  % \bitheader{}
  \\[1px]
  \begin{rightwordgroup}{\small{$2^{\frac{\log N}{2}-100}$}}
    \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{3}{\tiny $\cdots$} &
    \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{1}{0} & \bitbox{1}{1} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{3}{\tiny $\cdots$}    & \bitbox{1}{0} & \bitbox{1}{0} &
  \end{rightwordgroup}
  \\[1ex]
  \wordbox[]{1}{} &&
  \\[1ex]
  \begin{rightwordgroup}{$p$}
    \bitbox{1}{0} & \bitbox{1}{1} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{1} & \bitbox{1}{1} &
    \bitbox{3}{\tiny $\cdots$} &
    \bitbox{1}{0} & \bitbox{1}{1} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{1}{0} &
    \colorbitbox{lightgray}{1}{1} & \colorbitbox{lightgray}{1}{0} &
    \colorbitbox{lightgray}{1}{0} &
    \colorbitbox{lightgray}{4}{\tiny{$\cdots$ LSB $\cdots$}} &
    \colorbitbox{lightgray}{1}{0} &
  \end{rightwordgroup}
  \\[1ex]
  \begin{rightwordgroup}{$q$}
    \bitbox{1}{0} & \bitbox{1}{1} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{0} & \bitbox{1}{1} &
    \bitbox{3}{\tiny $\cdots$} &
    \bitbox{1}{0} & \bitbox{1}{1} & \bitbox{1}{0} & \bitbox{1}{0} &
    \bitbox{1}{0} &
    \colorbitbox{lightgray}{1}{0} & \colorbitbox{lightgray}{1}{0} &
    \colorbitbox{lightgray}{1}{0} &
    \colorbitbox{lightgray}{4}{\tiny{$\cdots$ LSB $\cdots$}} &
    \colorbitbox{lightgray}{1}{0} &
  \end{rightwordgroup}
\end{bytefield}
\vfill

For example, in the case of a RSA key $1024$, the binary difference between $p$
and $q$ has to be greater than $2^{412}$, which means that, excluding corner-cases
where the remainder is involved, there must be at least one difference in the
top 100 most significant bits for the key to be considered safe.


\section{Thoughts about a parallel solution}

At first glance we might be willing to split the entire interval
$\{ \ceil{\sqrt{N}}, \ldots, N-1 \}$ in equal parts, one per each
node. However, this would not be any more efficient than the trial division
algorithm, and nevertheless during each single iteration, the computational
complexity is dominated by the square root $\dsqrt$ function, which belongs to
the class \bigO{\log^2 N}, as we saw in section ~\ref{sec:preq:sqrt}.
Computing separatedly $x^2$ would add an overhead of the same order of magnitude
\bigO{\log^2 N}, and thus result in a complete waste of resources.

%%As a result of this, we advice the use of a strictly limited number of
%%processors - like two or three - performing in parallel fermat's factorization
%%method over different intervals.

%%% Local Variables:
%%% TeX-master: "question_authority.tex"
%%% End: