\chapter{An Empirical Study}

Excluding Dixon's factorization method, all attacks analyzed so far exploit
some peculiarities of a candidate RSA public key $\angular{N, e}$ in order to
recover the private exponent.
Summarizingly:
\begin{itemize}
  \item Pollard's $p-1$ attack works only if the predecessor of any of
    the two primes factorizing the public modulus is composed of very small
    primes;
  \item  Williams' $p+1$ attack works under similar conditions - the predecessor
    or the successor of any of the two primes can be easily factorized;
  \item Fermat's factorization is valuable whenever the two primes $p$ and $q$
    are really close to each other;
  \item Pollard's $\rho$ method is best whenever one of the two primes is
    strictly lower than the other.
\end{itemize}
Dixon's factorization method instead, being a general-purpose factorization
algorithm, can be employed to \emph{measure} the strength of a RSA
keypair: the more relations (satisfying \ref{eq:dixon:fermat_revisited}) are
found, the less it is assumed resistant.

Given these hypotesis, it has been fairly easy to produce valid RSA candidates
that are exploitable using the above attacks, and use them to assert the
correctness of the implementation.

On the top of that, there has been a chance to test the software under real
conditions: we choose download the SSL keys (if any) of the top one million visited
websites, and survey them with the just developed software. This not only gave
us the opportunity to survey the degree of security on which the internet is
grounded today, but also led to a deeper understanding of the capacities and limits of
the most widespread libraries offering crypto nowadays.

\vfill
\section{To skim off the dataset}

What has been most scandalous above all was to discover was that more than
\strong{half} of the most visited websites do \strong{not} provide SSL
connection over port 443 - reserved for HTTPS according to IANA
\cite{iana:ports}.
To put it in numbers, we are talking about $533$ thousands websites either
unresolved or unreachable in $10$ seconds.
As a side note for this, many websites (like \texttt{baidu.com} or
\texttt{qq.com}) keep a tcp connection open without writing anything to the
channel, requiring us to adopt a combination of non-blocking socket with the
\texttt{select()} system call in order to drop any empty communication.
It would be intesting to investigate more on these facts, asking ourselves how
many of those unsuccessful connetion are actually wanted from the server, and
how many dropped for cernsorship reasons; there's enough room for another
project.

Of the remaining $450,000$ keys, $21$ were using different ciphers than RSA. All
others represent the dataset upon which we worked on.

\section{To count}

Once all valuable certificate informations have been stored inside a database,
almost any query can be performed to get a statistically valuable dregree of
magnitude to which some conditions are satisfied. What follows now is a list of
commented examples that we believe are relevant parameters for understanding of
how badly internet is configured today.

\begin{figure}[H]
  \includegraphics[width=0.7\textwidth]{e_count.png}
\end{figure}

The most prolific number we see here, $65537$ in hexadecimal, is the fouth
Fermat number and no other than the largest known prime of the form $2^{2^n} +
1$. Due to its composition, it has been advised by NIST as default public
exponent, and successfully implemented in most softwares, such as \openssl\!.

Sadly, a negleglible number of websites is using low public exponents,
which makes the RSA key vulnerable to Coppersmith's attack. Unfortunately, this
topic goes beyond the scope of this research and hence has not been analyzed
further.

\begin{figure}[H]
  \includegraphics[width=0.7\textwidth]{n_count.png}
\end{figure}

What is interesting to see here is that an enormous portion of our dataset
shared the same public key, pushing down our of one order of magnitude the
number of expected keys. Reasons for this are mostly practical: it is extremely
frequent to have blogs hosted on third-party sercives such as ``Blogspot'' or
``Wordpress'' which always provide the same X.509 certificate, as they belong to
an unique organization.
Though improbable, it is even possible that exists a millesimal portion of
different websites sharing the same public key due to a
bad CSRNG, and therefore also the same private key. Such a case has been
already investigated in \cite{ron:whit}.

\begin{figure}[H]
  \includegraphics[width=0.6\textwidth]{localhost_certs.png}
\end{figure}

Here we go. A suprisingly consistent nuber of websites provides certificates
with dummy, wrong, or even testing informations. Some even inject non-printable
bytes in the \emph{common name} field.
Some are certified from authorities, some chinese governmental entities.


\chapter{Conclusions \label{conclusions}}
\noindent
Everytime we see a certificate, we get this idea the somebody is telling us the
connection is safe. There is some authority out there telling what to do.
We should be thinking more about what these authorities are and what they are
doing.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "question_authority"
%%% End: