Friday, January 13, 2017

RWC 2017 - Secure MPC at Google

This talk was given by Ben Kreuter and its focus was on the apparent disparity between what we research in academia versus what is required in the real world, specifically in the field of multi-party computation (MPC). MPC is the idea of allowing multiple parties to compute some function on their combined input without any party revealing anything about their input to the other parties (other than what can be learnt from the output alone).

While significant work has been done on making MPC efficient in practice (for example, the work of Yehuda Lindell et al. on high-throughput MPC which was presented by Lindell in the preceding talk), the focus tends to be on generic protocols (e.g. general logic circuits) with strong security guarantees (e.g. malicious security), which invariably leads to large computational overhead. In practice, we usually require only specific protocols, which can therefore be optimised, and comparatively weak security guarantees.

In the real world, network cost is the salient factor, rather than the speed of the protocol, since the parties who are involved in a computation often have to use networks (such as the Internet) which are being used by many other people at the same time and cannot make the best use of the network's full capabilities. The MPC at Google is about computation amongst, for example, mobile phones, laptops and servers; this introduces issues like battery constraints and the possibility of the computation not completing; these considerations, firmly grounded in the real world, are important when developing MPC techniques in research.


Business applications

A large portion of Google's revenue is generated by advertising: the tech giant, well-known for its aptitude for accurately determining users' desired search results even when queries are expressed ineloquently, specialises in creating personalised adverts to its wide spectrum of users. The efficacy of an advert is generally measured by the proportion of viewers of it who later become customers. Clearly this can be done by businesses comparing their database of customers' transactions with Google's databases of who has been shown which adverts. This, however, would be an invasion of privacy: instead, Google and the business can do MPC: more specifically, a private set intersection protocol.

In a private set intersection protocol, the parties involved compute how large the intersection is amongst the sets input by each party, or even some function on those elements in the intersection. So if the business and Google compute a private set intersection protocol on their data, they can determine how well the advertising went.

Roughly speaking, the MPC Google does in the real world is as follows: Google has a set $\{g_1,g_2,...,g_n\}$ of field elements which encodes a set of people who have been shown an advert for a certain product, and a business has a set $\{b_1,b_2,...,b_m\}$ of field elements which encodes a set of people who have been sold the product in question; Google raises each of its elements to a power $G$ and sends the set $\{g_1^G,g_2^G,...,g_n^G\}$ to the business. The business does the same with its elements for some exponent $B$ to get $\{b_1^B,b_2^B,...,b_m^B\}$, encrypts a set of binary vectors under Paillier encryption (which is additively homomorphic), one corresponding to each element in its set, encoding some other property of the sales (like the amount paid), and also computes the set $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$. The business sends Google the set of pairs $\{(b_1^B,P(v_1)),(b_2^B,P(v_2)),...,(b_m^B,P(v_m))\}$ along with $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$, and Google computes $\{b_1^{GB},b_2^{GB},...,b_m^{GB}\}$ and adds together all encrypted vectors $P(v_i)$ for which there exists some $j$ such that $g_i^{GB}=b_j^{GB}$. It sends this ciphertext back to the business, which decrypts and interprets the result.

This protocol is very simple, and it is only passively secure (in which players are assumed to execute the protocol faithfully but will possibly try to learn things by inspecting their communication transcripts). An interesting, perhaps somewhat orthogonal concern, to how we approach research from an academic point of view is that it is important that we can convey the security and efficiency of our protocols to lawyers, managers and software engineers who will eventually be sanctioning, authorising or implementing the protocols. "The lawyers are interesting because you can show them a proof, and two plus two equals four is a negotiable statement here... managers usually trust your expertise...and software engineers are the worst because they already assume [the protocol] is impossible."

An alternative solution using garbled circuits was explored in the recent past, but it turned out that their use required some subtle assumptions regarding the computation and communication which would have made the protocol impractical.

Future work would involve getting a (not too much more expensive) maliciously secure protocol and developing the use of the homomorphic encryption to allow different functions to be computed on the data in the intersection.

Consumer applications

The Android keyboard app by Google, Gboard, logs what a user types so that it can guess words for auto-completing in the future. This data could be used for training machine learning models, and merging results from many local models would enable the formation of guessing algorithms that work well for everyone. However, to do this, the server would need to receive a set large dataset of words typed by a user from each phone so that this processing could be done. Clearly there is an issue of privacy here; moreover, there is also potentially a differential privacy issue.

This is clearly a good situation in which to use MPC. Each party masks their data using a basic additive secret-sharing scheme: if each party has a vector to input, for every coordinate, every pair of parties agrees on some random field element, one subtracts and one adds this to that coordinate of their vector. When the parties send this to Google, the masks will therefore cancel when added together.

In practice,they use a PRG and perform a key exchange (in which one key is given to each pair of parties, for every possible pair) at the beginning to achieve the same effect but with much smaller communication overhead. They also have a trick for dealing with device failures (which is important given the application).


This talk provided helpful and relevant insight into the the importance of matching what we research with what we require in the real world, which is, after all, one of the main reasons for having conferences such as Real World Crypto. Many of the talks are available to watch online here, and I would highly recommend doing so if interested.

Thursday, January 12, 2017

RWC 2017 - Is Password Insecurity Inevitable?

Fresh back from an enlightening trip across the pond, I wanted to write about one of my favourite talks, all about password (in)security, from this year's Real World Cryptography conference.

As we know:
  1. Passwords protect everything.
  2. Passwords are terrible.
But happily, Hugo Krawczyk from IBM Research spoke about some great new work to resolve these two seemingly incompatible statements. There were a lot of details in the talk that I'll have to miss out here (slides are available online). In particular, I'm going to focus on 'Part I: Take the burden of choosing and memorising passwords off humans'.

The basic idea - this isn't new - is to have the user memorise a single master password that they use to access a password store. Then the password store derives unique pseudorandom passwords for each service the user wants to access (Google, Facebook, etc.) The problem with this solution is that the password store becomes a single point of failure: if it is compromised, then an offline dictionary attack to find the master password will compromise all of the user's accounts at once.

Krawczyk et al. suggest an improvement: SPHINX, which amusingly stands for "a password Store that Perfectly Hides from Itself (No eXaggeration)". The first idea is for the password store to not keep hold of (even a hash of) the master password - instead it has an independent secret key $k$, and any time the user wants to log in to a service $S$, they send the master password $pwd$ to the store, the store computes a PRF $PRF(k, pwd | S)$ and this will be sent to $S$ as the user's password for $S$. This means that if the store is compromised, the master password and the account passwords can't be learned unless the user communicates with the store. So this works well if the store is in local, offline hardware, where the user is unlikely to use the store after it is compromised by an attacker.

However, the authors go further and replace the PRF with an oblivious PRF. This means the store computes an "encrypted" version of $PRF(k, pwd | S)$ from an "encrypted" $pwd|S$, so doesn't learn the plaintext values of the master password or the service password. In practice this can be achieved by the user (i.e. the user's machine) hashing the string $pwd | S$ into an element $g$ of a Diffie-Hellman group, then computing $h = g^r$, where $r$ is a fresh, random exponent, and sending $h$ to the password store. The store's secret key is an exponent $a$, so it computes $h^a$ and sends this back to the user. The user removes the blinding exponent $r$ (i.e. computes $(h^a)^{r^{-1}} = g^a$) and the result is the unique password for $S$. Now even when the password store is compromised and even if the user communicates with the store, the master password and the account passwords can't be learned.

In principle an attacker could recover all the account passwords by compromising both the password store and a service $S$, learning the secret key $a$ and the service password $g^a$, computing $g = H(pwd|S)$ and perfoming an offline dictionary attack to find $pwd|S$. Then for any other service $S'$, the password can be computed via $H(pwd|S')^a$. But as long as $S$ follows good practice and only stores a hash $H'(g^a)$ of the service password, this attack fails: an offline dictionary attack to recover $g^a$ is unfeasible as it's essentially a random group element.

There are no particularly expensive computations involved in using SPHINX, the communication between the user and SPHINX does not need to be secure (so it could be online somewhere) and the store will work regardless of what password protocol is used by the service, so it's extremely flexible. SPHINX therefore strikes me as both useful and practical, which is surely the definition of Real World Cryptography.

Monday, January 9, 2017

RWC 2017 - Post-quantum cryptography in the real-world

A new year takes off and, along with it, thousands of resolutions are formulated. Although I am not the right person to talk about them (my diet will begin next Monday), I wish to discuss a resolution that the cryptographic community as a whole has set for itself in this 2017. Because that's what people do at Real World Crypto (RWC): they talk about new threads, topics could be worth exploring during the new year, directions for their researches and interests. This year, for the first time in RWC, post-quantum cryptography (PQC) was given an entire session, clear sign that time is changing and the moment has come to bring the discussion to the real world. The message is clear: even if quantum computers are not popping up in tomorrow's newspapers, we can't postpone any longer.

A very simple reason for this was given by Rene Peralta, of the NIST PQC team, during the overture of the session: standardisation takes time, up to seven years if we start right now, and full transition takes even longer. I found Rene's presentation to be neat and direct: our public-key cryptography fails against quantum computers and our symmetric one needs some (non-drastic) modifications. The resolution is to "start thinking about it this year, possibly by November 30th, 2017". However, a question arises quite naturally: are we ready?

The other three talks of the session tried to answer in the affirmative. Among the several PQC proposals that are around in theoretical papers, two made their ways into RWC: the well-stablished lattice-based cryptography and the new-born isogeny-based cryptography, which nevertheless carries the pride and sympathy of ECC.

Lattices and funny names: NewHope and Frodo and Crystals
Lattice-based cryptography has three representatives in the run for PQC schemes. Valeria Nikolaenko showed two: the first one is called NewHope and is a key agreement protocol based on the hardness of Ring-LWE. The latter is a problem very favourable to applications because it combines sound theoretical security (worst-case to average-case reduction) to fast implementations thanks to specific choices of parameters which allow for speed-ups in the computations: NewHope turns out to be even faster than ECC and RSA, but at the price of a larger communication. However, there are some concerns on the security of LWE when the ring structured is added. Thus, Frodo ("take off the ring") is designed to achieve the same goal using only standard LWE. The drawback is a degradation in performance, since the tricks hinted above cannot be used anymore and keys are generally bigger.

The third lattice-based scheme was presented by Tancrede Lepoint and is a suite called Crystals. This is based on yet another kind of lattices: module lattices, for which it is also known a worst-case to average-case reduction. These are less structured lattices (hence possibly calming down the detractors of ring structure) in which similar implementation speed-ups are possible: the timing is indeed comparable to NewHope's, while the communication is improved.

"Make elliptic curves great again"
Michael Naehrig presented a new proposal for PQC: do you remember curves with plenty of small subgroups where to easily solve the discrete logarithm problem? Now they come in handy again: all the subgroups (of order 2 and 3) are considered to be nodes of a graph, whose edges are the isogenies (a.k.a. bijetive homorphisms between curves). In this new context, given two curves in the graph, it is difficult to come up with the isogeny linking the two. However, such a new approach doesn't really stand against other solutions: keys are small but performance is not a pro (so to speak).

RWC 2017 - Erasing Secrets from RAM

One of my favourite talks from the Real World Crypto 2017 conference was given by Laurent Simon, on Erasing Secrets from RAM.

In short, it was found that in practice, many non-malicious programs handling keys and other sensitive data do not erase the RAM correctly. This would allow an attacker (that has access to all of a system's volatile memory and CPU state) access to any unerased sensitive data.

It was thought that compiler optimisation played a part in the lack of erasion. Take the code below:

void sensitive_function(...) {

   u8 sensitive_buffer[KEY_MAX] = "\0";
   ...
   zeromem(sensitive_buffer, KEY_MAX);

}

The compiler may choose to remove the zeromem line, as the sensitive_buffer is going out of scope anyway. This would leave sensitive data on the RAM, unbeknownst to the programmer.

So, the paper presents a tool that allows developers to mark sensitive variables in their code, and then see (post-compilation) any potential leakage of sensitive data.

They call this tool Secretgrind, based off the popular Valgrind.

Anyway, as it turns out, the compiler optimisation problem mentioned above wasn't actually a problem in practice - they didn't once encounter this problem in all their testing. Instead, the majority of sensitive leaks were down to developers' mistakes; they had forgotten to erase sensitive variables on both the stack and the heap.

There were a few issues with IO API's caching optimisations, though - such as when you read a line from a PEM file using mmap, it often loads the whole file into memory to save you the time. However, this is not immediately obvious, and when you go to delete the line from RAM, the rest of the file is still in memory!

Laurent concluded the talks saying Secretgrind was still in development, and although referring to it as a 'hack' (due to it's fragility), wishes for it to be used to "help you guys check your code".

Thursday, January 5, 2017

RWC 2017 - A Formal Security Analysis of the Signal Messaging Protocol

Real World Crypto 2017 kicked off yesterday in New York City. This afternoon, Luke Garratt presented his and his colleagues' work, A Formal Security Analysis of the Signal Messaging Protocol. The signal protocol is used by the Facebook messenger app, WhatsApp and the Signal app (to name but a few). It is therefore surprising that there had been no formal security analysis, and their work addresses this issue. The paper is motivated by the questions

What should Signal achieve?

and

Does it?

Or, put in more modern language (spoiler alert),
Why what does is 

Let's first look at what kind of security we can hope for. We speak of Forward Secrecy, which ensures that past messages are not compromised even if the communication is at some point in the future. This is to prevent an adversary storing all the conversations and waiting until an attack is successful to then recover all the communication. 

Post-compromise security pushes this even further. If we have post-compromise security, not only past conversations are not compromised, but also future ones. The Signal protocol achieves this using a technique called ratcheting, which involves session keys being updated with each message sent. Why is this useful? Well, it makes the adversary's life much harder. In order to attack a system with post-compromise security, an adversary must obtain long-term keys and immediately attack and continue attacking if they want to compromise future sessions. As opposed to forward security, where an adversary would obtain a long-term key and wait for an interesting target to launch a man-in-the-middle attack (e.g. TLS-DHE) or to no forward security, where they would just store all ciphertext traffic until they obtain a long-term key and decrypt everything (e.g. TLS-RSA). Times are hard. 

Their security model captures: 
  • Full network control by the adversary;
  • Perfect forward secrecy;
  • Post-compromise security;
  • Key compromise impersonation attacks;
  • Some (but not all) random numbers compromise.
Their proof is too long to be featured in this blog post, but Luke promises it is tedious rather than complex. Their conclusion? So far, so good. 

In terms of limitations, they note that they have not analysed any implementation of the protocol, so this is a theoretical analysis only. They also assume an honest key distribution and have not (yet) considered multiple devices.