\chapter{An Empirical Study \label{chap:empirical_study}}

Excluding Dixon's factorization method, all attacks analyzed so far exploit
some peculiarities of a candidate RSA public key $\angular{N, e}$ in order to
recover the private exponent.
Summarizingly:
\begin{itemize}
  \item Pollard's $p-1$ attack works only if the predecessor of any of
    the two primes factorizing the public modulus is composed of very small
    prime powers;
  \item  Williams' $p+1$ attack works under similar conditions - on the
    predecessor or the successor of any of the two primes ;
  \item Fermat's factorization is valuable whenever the two primes $p$ and $q$
    are really close to each other;
  \item Pollard's $\rho$ method is best whenever one of the two primes is
    strictly lower than the other;
  \item Wiener's attack is guaranteed to work on small private exponents.
\end{itemize}
Dixon's factorization method instead, being a general-purpose factorization
algorithm, can be employed to \emph{measure} the strength of a RSA
keypair: the more relations (satisfying \ref{eq:dixon:fermat_revisited}) are
found, the less it is assumed to be resistant.

Given these hypothesis, it has been fairly easy to produce valid RSA candidate
keys that can be broken using the above attacks. They have been used to assert
the correctness of the implementation.

On the top of that, there has been a chance to test the software under real
conditions: we downloaded the SSL keys (if any) of the top one million visited
websites, and survey them with the just developed software. This not only gave
us the opportunity to survey the degree of security on which the internet is
grounded today, but also led to a deeper understanding of the capacities and limits of
the most widespread libraries offering crypto nowadays.

\vfill
\section{To skim off the dataset}

What has been most scandalous above all was to discover that more than
\strong{half} of the most visited websites do \strong{not} provide SSL
connection over port 443 - reserved for HTTPS according to IANA
\cite{iana:ports}.
To put it in numbers, we are talking about $533, 000$ websites either
unresolved or unreachable in $10$ seconds.
As a side note for this, many websites (like \texttt{baidu.com} or
\texttt{qq.com}) keep a TCP connection open without writing anything to the
channel, requiring us to adopt a combination of non-blocking socket with the
\texttt{select()} system call in order to drop any empty communication.
It would be interesting to investigate more on these facts, asking ourselves how
many of those unsuccessful connections are actually wanted from the server, and
how many dropped for censorship reasons; there is enough room for another
project.

Of the remaining $450,000$ keys, $21$ were using different ciphers than RSA. All
others represent the dataset upon which we worked on.

\section{To count}

Once all valuable certificate informations have been stored inside a database,
almost any query can be performed to get a statistically valuable measure of
degree of magnitude to which some conditions are satisfied. What follows now is
a list of commented examples that we believe are relevant parameters for
understanding of how badly internet is configured today.


\begin{figure}[H]
  \includegraphics[width=0.7\textwidth]{e_count.png}
\end{figure}

The most prolific number we see here, $65537$ in hexadecimal, is the fourth
Fermat number and no other than the largest known prime of the form $2^{2^n} +
1$. Due to its composition, it has been advised by NIST as default public
exponent, and successfully implemented in most software, such as \openssl\!.

Sadly, a negligible number of websites is using low public exponents,
which makes the RSA key vulnerable to Coppersmith's attack; though, this
topic goes beyond the scope of this research and hence has not been analyzed
further.

\begin{figure}[H]
  \includegraphics[width=0.7\textwidth]{n_count.png}
\end{figure}

What is interesting to see here is that an enormous portion of our dataset
shared the same public key, pushing down the number of expected keys of one
order of magnitude. Reasons for this are mostly practical: it is extremely
frequent to have blogs hosted on third-party services such as ``Blogspot'' or
``Wordpress'' which always provide the same X.509 certificate, as they belong to
an unique organization.
Though improbable, it is even possible that exists a millesimal portion of
different websites sharing the same public key due to a
bad cryptographically secure random number generator, and therefore also the
same private key. Such a case has been already investigated in \cite{ron:whit}.

\begin{figure}[H]
  \includegraphics[width=0.6\textwidth]{localhost_certs.png}
\end{figure}

Here we go. A suprisingly consistent number of websites provides certificates
filled with dummy, wrong, or even testing informations.\\
Some do have non-printable bytes in the \emph{common name} field.\\
Some are certified from authorities. \\
Some are even gonvernmental entities.

\begin{figure}[H]
  \includegraphics[width=0.9\textwidth]{bits_count.png}
\end{figure}

According to \cite{nist:keylen_transitions} \S 3, table $2$, all RSA keys of
bitlength less than $1024$ are to be considered deprecated at the end of $2013$
and shall no more be issued since the beginning of this year. Not differently
from the above results, the remark has been globally adopted, yet still with a
few exceptions: around a dozen of non-self-signed certificates with a 1024 RSA
key appears to have been issued in 2014.


\section{The proof and the concept}

At the time of this writing, we have collected the output of only two
mathematical tests performed in the university cluster.

\paragraph{Wiener.} The attack described in chapter \ref{chap:wiener} was the
first employed, being the fastest one above all others. Recalling the different
public exponents we probed and discussed in the preceeding section (all $\leq
65537$), we expected all private expenents to be $>  \rfrac{1}{3}\sqrt[4]{N}$
and therefore not vulnerable to this particular version of Wiener's attack.
Indeed, we found no weak keys with respect to this attack. Though, as
pointed out in \cite{20years} \S 3, there is still the possibilty that the
public keys we collected could be broken employing some variants of it.

\paragraph{GCD.} On the wave of \cite{ron:whit}, whe attempted also to perform
the $\gcd$ of every possible pair of dinstinct public modulus present in the
dataset. In contrast to our expectations, this test led to no prime factor
leaked, for any key pair. We have reasons to believe this depends on the
relatively small size of our dataset, with respect to the one used in
\cite{ron:whit}.


\chapter{Conclusions \label{conclusions}}

Everytime we surf the web, we share our communication channel with lots of
entities around the globe. End-to-end encryption protocols such as TLS can
provide the security  properties that we often take as granted, like
\emph{confidentiality}, \emph{integrity}, and \emph{authenticity}; though,
these holds only if we \emph{trust} the authorities certifying the end entity.

%% Wax Taylor - Que Sera
There is this mindless thinking that whenever we see that small lock icon in the
browser's url bar, somebody is telling us the connection is safe.
There is some authority out there telling what to do, and we should be thinking
more about what these authorities are and what they are doing.
This issue is no more a technical problem, but instead is becoming more and more
a social and political problem.
It is our responsability as citzens to do something about that.


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "question_authority"
%%% End: