Wednesday, December 24, 2014

52 Things: Number 12: What is the elliptic curve group law?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the Mathematical Background section by introducing the Elliptic Curve Group Law...

The Elliptic Curve group law is a method by which a binary operation is defined on the set of rational points of an elliptic curve to form a group. Now, lets go through what that actually means, and what it's used for. Thanks to Dr Dan Page for providing the group law diagram.

An Elliptic Curve and its rational points

An Elliptic Curve is a cubic equation in two variables over some mathematical field. They can be written in various forms, but over most fields 1 can be written in short Weierstrass form:
$$E : y^2 = x^3 + ax + b $$
For now we will assume that we are working in the field of real numbers, and ignore any complications that come from using finite fields. With some simple requirements on $a,b$ (specifically, that $27b^2\neq -4a^3$) this is an elliptic curve.

The set of points that will be elements of our group are going to be the rational points of the elliptic curve. This is simply the collection of points $(x,y)$ that satisfy the curve equation where both $x,y$ are rational. So, that's the set of $(x,y)\in \mathbb{Q}$ where $y^2=x^3+x+b$.  For reasons that will become clear, we also include a point at infinity 2.

Adding a Group Law to an elliptic curve

The simplest way to describe the relation we're going to add to the set of rational points is with a diagram:

Elliptic Curve (blue) with two points (P,Q) and their sum (P+Q) plotted, along with construction lines (red)
So, to add together $P$ and $Q$, we draw a line through $P$ and $Q$, and make $T=(T_x,T_y)$ the third point this line intersects the curve. Then, $P+Q=(T_x,-T_y)$. To add a point to itself, we take the tangent at that point. Now, the surprising fact is that this operation defines a group, with the point at infinity the neutral element.
Most of the requirements of being a group are easy to see geometrically3. For example, it is easy to find the inverse of an element. In the diagram above $(P+Q)+T=0$, because the line from $T$ to $(P+Q)$ has it's third intersection at infinity, and so $(P+Q)=-T$. In fact, for any elliptic curve in short Weierstrass form, to negate a point we simply change the sign of it's y-coordinate.

Is that all there is to it?

Pretty much yes. The same method holds to over finite fields, although in this case it tends to be simpler to think of the group's operation as being an algebraic construct rather than geometrical, since Elliptic Curves over finite fields do not have such an intuitive structure.  Also, we don't need to view curves in short Weierstrass form, since there are many different coordinate schemes and equations that represent the same curve. Indeed, some choices of curve and coordinate system assist us in doing certain types of computation.

What's that got to do with Cryptography?

It turns out that over certain finite fields the Elliptic Curve Group has several nice properties for cryptographers. There are a surprisingly large number of curve and field pairs where it's not too costly to do group computations4, but for which the various discrete log or DH problems (see last week's blog) are hard. Moreover, compared to using large multiplicative groups (eg RSA groups) the variables computed with are much smaller. Putting all these together, elliptic curves allow cryptographers to efficiently calculate ciphertexts that are much smaller than those created by alternative means without reducing security.





  1. Specifically, fields of characteristic not equal to 2,3. That is, fields where $2\neq0$ and $3\neq 0$. Unfortunately, this obviously means that the results we discuss won't hold in binary fields, but that is rather beyond the scope of this talk.
  2. Justification for this comes from considering the elliptic curve as a curve in projective space, but for now it suffices that such a point exists.
  3. Associativity is by far the most complicated to show. This diagram on wikipedia explains the concept behind the proof, although the details are rather involved.
  4. Even as I write this, I'm sure someone will question the validity of this claim, but it is true that compared to many groups that one could construct in which the required problems are sufficiently hard, point arithmetic on an elliptic curve is comparatively tractable.

Friday, December 19, 2014

Study Group: Efficient Smart Phone Forensics Based on Relevance Feedback

This week's study group was given by Panos on the subject of Efficient Smart Phone Forensics Based on Relevance Feedback[1].  

 The Digital Forensics Procedure has 4 major steps: Device Secure, Data Extraction, Data Analysis and Data Representation. To improve the efficiency of data analysis, a sub procedure known as forensic triage is involved to mark out the most relative information among in the phone. In this paper, a ranking system called LIFTR was presented to prioritize the information recovered from Android phones where the amounts of data are huge comparing to feature phones.


LIFTR is designed under following assumptions:
  1. Physical image of phone is acquired.
  2. The data is not encrypted. They are filtered binary files (which are the output of previous procedure).
  3. The format of data are list of strings, e.g. the result of DEC0DE or strings command.
LIFTR then takes the information parsed by recovery engine as the input alongside with optional suspect information, and then outputs a ranked list of the most important NAND pages.

There are two major steps in the design of LIFTR:
  1. Initial Ranking. The input of LIFTR are unranked pages. The initial ranking will mark the pages based on a relevance metric. After this procedure, the pages will be ranked for the next step.
  2. Relevance Feedback. After the initial ranking, LIFTR will present some possibly relevant fields of pages to the investigator and ask for the labelling, which is usually manually, of true and false positive. LIFTR then resorts the ranking by the labelling information assigning relevant pages with higher ranks and irrelevant pages lower. This process can be repeated multiple time to reach a better performance.

The experimental result shows that LIFTR can efficiently mark out the relevant pages. However, it does not guarantee the target information will be included in its result; therefore all data should still be analyzed during an investigation. However, LIFTR is still an effective tool to improve the efficiency of the investigation.


[1]Saksham Varma, Robert J. Walls, Brian Lynn, and Brian Neil Levine. 2014. Efficient Smart Phone Forensics Based on Relevance Feedback. InProceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices (SPSM '14). ACM, New York, NY, USA, 81-91. DOI=10.1145/2666620.2666628 http://doi.acm.org/10.1145/2666620.2666628

52 Things: Number 11: What are the DLP, CDH and DDH problems?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog is the second in the section on Mathematical Background and concerns how group operations can be used to design cryptographic primitives.

As you probably know by now, cryptography often relies on 'hard problems'. That is, we design cryptographic protocols whose security can be proven if we assume that the adversary is unable to solve a certain (mathematical) problem in a reasonable amount of time. This blog post introduces three such problems which are widely used in security proofs. Luckily for me, a) this is really just Group Theory, not Computer Science and b) just two days before writing this I attended a very decent guest lecture by fellow Bristol Crypto researcher Susan Thomson on this exact topic. (That is to say, any inaccuracies in the following can and should be blamed on her!)

The Discrete Logarithm Problem (DLP)

Okay, let $G$ be an abelian group. First we write the operation in $G$ multiplicatively. For any $g \in G$ and any integer $a>1$ let $g^a$ denote $g * g * ... * g$ with $a$ occurrences of $g$. The discrete logarithm problem (DLP) is:

Given $G$, $g$ and $h=g^a$, find $a$. 

Here $a$ is called the discrete logarithm of $h$ with base $g$, hence the name of the problem.

Is the discrete logarithm problem hard? Sometimes, maybe. As a counter-example, let $G$ be the integers under addition. So now it makes sense to write the group operation additively, not multiplicatively. So the same process of repeating the group operation with the same element $g$ is now written $g + g + ... + g$ with, say, $a$ summands and, since we're working in the integers, this sum, call it $h$, is just equal to the integer $ag$. Therefore $a$, the discrete logarithm of $h$ to the base $g$, can be found by just dividing $h$ by $g$. For example, if I say to you, "find the discrete logarithm to the base 3 of 18 in the integers under addition", you'd just write $3 + 3 + ... + 3 = 18$ with $a$ summands on the left, simplify this to $3a = 18$ and find that $a = 6$. I could change the group to integers modulo $N$ for some integer $N>1$ (still under addition) but the problem wouldn't be much harder: one has to solve an equation like $ag \equiv h \: (\mathrm{mod} \: N)$ which is solved by performing the Extended Euclidean Algorithm to find $g^{-1}\: (\mathrm{mod} \: N)$ (if it exists), multiplying this by $h$ and reducing modulo $N$ to obtain $a$. All of this is can be done in polynomial time - no good for building secure cryptographic primitives.

On the other hand, the DLP in a finite field of prime order viewed as a group under multiplication (after throwing out the zero element) or in elliptic curve groups (more on those next week) is believed to be hard. That is, we do not yet know of any polynomial time algorithms for finding discrete logarithms in these groups. As a concrete example, suppose I ask you, "find the discrete logarithm to the base 3 of 5 in the multiplicative group of the integers modulo 7". This means find an integer $a$ such that $3^a \equiv 5\: (\mathrm{mod} \: 7)$. Now that we are in the multiplicative group, not the additive one and so we really do have to 'take logarithms' somehow, not just multiply by the inverse of 3. In this case, since 7 is fairly small, we can find the answer by just trying all the possibilities one at a time until we find a solution:
  1. $3^1 = 3 \not\equiv 5\: (\mathrm{mod} \: 7)$
  2. $3^2 = 9 \equiv 2 \not\equiv 5 \: (\mathrm{mod} \: 7)$
  3. $3^3 = (3^2)\times3 \equiv 2\times3 = 6 \not\equiv 5\: (\mathrm{mod} \: 7)$
  4. $3^4 = (3^3)\times3 \equiv 6\times3 = 18 \equiv 4 \not\equiv 5\: (\mathrm{mod} \: 7)$
  5. $3^5 = (3^4)\times3 \equiv 4\times 3 = 12 \equiv 5\: (\mathrm{mod} \: 7)$
so $a = 5$. The way we 'bounce around' the integers modulo 7 by repeatedly multiplying by the base 3 (getting 3, 2, 6, 4 and then 5) should give you some intuition as to why DLP seems hard in this setting. If our prime was much larger than 7, say thousands of bits long in binary, even a computer would take a very long time to solve DLP this way (though there are better, sub-exponential time algorithms and it is not proven that no polynomial time algorithm exists for solving DLP in this kind of group).

The Computational Diffie-Hellman Problem (CDH)
A problem related to DLP is named after Whit Diffie and Martin Hellman who devised a way of two parties agreeing on a secret key over a public channel without revealing it:
  • Alice and Bob publicly agree on a cyclic group $G$ and generator $g$. 
  • Alice chooses a random secret integer $a$ and Bob chooses a random secret integer $b$. 
  • Alice computes $g^a$ and publicly sends this to Bob. Bob computes $g^b$ and publicly sends this to Alice. 
  • Alice and Bob both compute $g^{ab}=(g^a)^b=(g^b)^a$ by raising what they received from the other party to power of their own secret integer. 
Now $g^{ab}$ is a secret key that can be used for symmetric encryption and decryption by Alice and Bob. But someone listening in to the exchange has in their possession $G$, $g$, $g^a$ and $g^b$. So secrecy of the key $g^{ab}$ depends on this problem, called the Computational Diffie-Hellman Problem (CDH):

Given $G$, $g$, $g^a$ and $g^b$, find $g^{ab}$.

CDH is clearly related to DLP, but which is harder? Well, if I can solve DLP then I can efficiently compute the secret integer $a$ from $g^a$ and then find $g^{ab}$ by raising $g^{b}$ to the power $a$ in the same way Alice does, therefore solving CDH. So anyone who can solve DLP can also solve CDH, meaning DLP is at least as hard as CDH.

The Decisional Diffie-Hellman Problem (DDH)
This is another 'discrete logarithm' style problem used to prove indistinguishability properties. Say Alice and Bob perform the Diffie-Hellman key agreement protocol as above so that $G$, $g$, $g^a$ and $g^b$ are all public and $g^{ab}$ is the shared secret key. Intuitively, the Decisional Diffie-Hellman Problem (DDH) asks whether an adversary can distinguish Alice and Bob's secret key $g^{ab}$ from a random group element of $G$. Formally:

Given  $G$, $g$, $g^a$, $g^b$ and $T_x$ such that $T_0$ is a random element of $G$, $T_1 = g^{ab}$ and $x$ is chosen uniformly at random from $ \lbrace 0,1 \rbrace $, find $x$.

If an adversary can solve DDH (i.e. output the correct value of $x$ with probability greater than $\frac{1}{2}$), then $G$, $g$, $g^a$ and $g^b$ must leak some information about the secret key $g^{ab}$ that distinguishes it from a random group element, even if it can't be computed directly. What should be clear is that if the adversary can solve the computational Diffie-Hellman problem, then they can actually compute $g^{ab}$ and hence trivially distinguish this element from a random group element, thereby solving the decisional Diffie-Hellman problem. So anyone who can solve CDH can also solve DDH, meaning CDH is at least as hard as DDH.

These are the three problems we wanted to talk about and we've given a sketch proof of their ordering in terms of hardness: DLP is the most hard, then CDH and then DDH. As we've seen, DLP is sometimes easy, making CDH and DDH easy. So the choice of group $G$ and generator $g$ is very important when doing cryptography!

Wednesday, December 17, 2014

Study Group: Attack on XLS

This week's study group, delivered by Guy, featured a paper by Mridul Nandi attacking the XLS domain completion method. This attack was initially given at DIAC in August, and was formally presented earlier this month at Asiacrypt 2014 in Taiwan. The idea of a domain completion method is to preserve length in an encryption scheme, so that the ciphertext is of the same length as the message. More formally, this means turning an encryption scheme that works on blocks of size n to a cipher that works on inputs of size mn. This contrasts to the strategy of padding (for example to the block length of a block cipher) that infers ciphertext expansion, which is used in many applications and standards. A number of approaches have appeared in the literature including ciphertext stealing, hash-then-counter (XCB, HCTR, HCH) and applying a block cipher twice to create a one-time pad for the final block (EME, TET, HEH) - but these methods are not applicable to generic blockciphers and lack formal security analysis.

At FSE 2007, Ristenpart and Rogaway presented XLS (eXtended Latin Squares), a domain completion method that inherits the assumed strong pseudorandom permutation property of the underlying blockcipher. The method is conceptually straightforward, utilising a (linear) mixing permutation and bit-flipping and the security proof is, as you would expect from the authors, relatively easy to follow despite its numerous intricacies. No fewer than six teams have incorporated XLS into their CAESAR submissions.

Despite this acquired confidence, Nandi presents an attack to demonstrate that XLS is in fact not a strong pseudorandom permutation (SPRP), giving an attack that makes just three encryption/decryption oracle queries and succeeds with probability 1/2. In addition, the author shows that replacing the mixing function defined in the original paper with some other stronger linear permutation(s) does not help, and gives a distinguishing attack (with success probability 1/4) against this generalised version of XLS.

Nandi does not precisely pinpoint any incorrect assertions in the proof nor provide any indication to give hope that XLS is fixable, and it will be interesting to see how the authors of the original paper (and the teams relying on XLS in their CAESAR submissions) respond in the coming months.

Friday, December 12, 2014

52 Things: Number 10: What is the difference between the RSA and strong-RSA problem?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog post introduces the RSA and Strong-RSA problems and highlights the differences between the two.


Cryptography relies heavily on the assumption that certain mathematical problems are hard to solve in a realistic amount of time. When looking at Public-Key (Asymmetric) Cryptography, which is what we'll be focusing on in this blog post we use the assumed existence of One-Way functions, i.e. functions that are easy to compute one way but are difficult to invert. We use problems from algorithmic number theory to produce these functions.

Factoring

The first difficult problem from number theory to talk about is factoring. Given a composite integer $N$ the factoring problem is to find positive integers $p,q$ such that $N = pq$. Although on the face of it this seems like a very simple problem, this is in fact a very tough, well studied problem. This can be solved in exponential time by checking all the numbers $p = 2, \ldots, \sqrt{N}$. However, solving a problem in exponential time is not fast enough. No polynomial time algorithm has been developed to solve the factoring problem, despite many years of research. Clearly there are examples of $N$ for which this is very easy to solve, for example whenever $N$ is even. Therefore, when starting to think about using this in a Cryptographic construction we consider $N$ as very large and being constructed by 2 large primes $p,q$. 

The RSA Problem

In RSA public-key encryption [1] Alice encrypts a plaintext $M$ using Bob's public key $(n,e)$ to ciphertext $C$ by $C = M^e (\textrm{mod } n)$ where $n$ is the product of two large primes and $e \geq 3$ is an odd integer that is coprime to the order of $\mathbb{Z}_n^{*}$, the group of invertible elements of $\mathbb{Z}_n$. Bob knows the private key $(n,d)$ where $de = 1 (\textrm{ mod } (p-1)(q-1))$ meaning he can compute $M = C^d (\textrm{mod } n)$. An adversary can eavesdrop $C$ and can know the public key $(n, e)$ however to calculate $M$ the adversary must find the factors of $n$. Therefore, this means the RSA problem is no harder than integer factorisation but is still a very hard problem to solve provided a suitable $n$ is chosen. 

The Strong RSA Assumption

The strong RSA assumption differs from the RSA assumption in that the adversary can choose the (odd) public exponent $e \geq 3$. The adversary's task is to compute the plaintext $M$ from the ciphertext given that $C = M^e (\textrm{mod } n)$. This is at least as easy as the RSA problem meaning that the strong RSA assumption is, unsurprisingly, a stronger assumption. The RSA problem is now over a quarter of a century old. Public key encryption schemes have been developed that derive their strength fully from the RSA problem.



[1] - http://people.csail.mit.edu/rivest/RivestKaliski-RSAProblem.pdf


Thursday, December 11, 2014

Password-Protected Secret Sharing

Hugo Krawczyk opened the final day presenting a paper on password-protected secret sharing. He began with a motivating example of protecting a high value secret such as a Bitcoin wallet. Typical storage methods using password authentication at a server are vulnerable to a single point of failure - an attacker just needs to break into one server, and can then mount an offline dictionary attack to guess the user's password and recover the key. One natural cryptographic solution to this problem is to secret share the wallet key across multiple servers (this approach is taken in the commercial solution offered by Dyadic Security), ensuring that an attacker must break into several servers to recover the key. The problem now is that in order to use the Bitcoin wallet, you must authenticate yourself to each of your servers - if this is done with a single password then there are now many points of failure for the system! An offline dictionary attack can now be performed against any one of the servers. Of course, one could choose different, strong passwords for each server, but this is unlikely to be practical for most users.

The notion of password-protected secret sharing (PPSS) gets around this. A PPSS scheme allows a user to secret share a secret among n servers (with threshold t) and store just a single password for authentication. Roughly speaking, the security guarantee is that any attacker breaking into up to t servers and obtaining all of their secret information (shares, long-term keys, password files etc) cannot learn anything about the secret or the password. This implies that the adversary's only hope is to try and guess the password in an online attack: offline attacks with fewer than t+1 servers are useless.

The main contribution of the paper is a new PPSS scheme, which only requires authenticated channels between the servers (except for a one-time setup phase) and is very efficient: authentication is done in one round of communication (2 messages), requiring 2t + 3 exponentiations for the client and 2 for each server. Previous schemes were much less efficient and often required additional assumptions such as a PKI.

The main building block of their construction is an Oblivious PRF (OPRF). Here a server holds a PRF key $k$, the user holds the input $x$ and obtains the output $f_k(x)$, whilst the server learns nothing about the input. This can be done very efficiently with a 'hashed Diffie-Hellman' technique, where the output is defined by $f_k (x) = H(H(x)^k)$ and can be computed with a standard DH exchange requiring just 2 exponentiations per party.

For the PPSS scheme, each server $S_i$ has an OPRF key $k_i$. The user chooses a password $p$, secret shares their secret $s$ into shares $s_i$ and runs the OPRF with each server to obtain $r_i = f_{k_i}(p)$. They then encrypt $s_i$ as $c_i = s_i \oplus r_i$, store $c_i$ at server $S_i$, erase all intermediate information and store the password $p$. To reconstruct, the user retrieves $c_i$ from $S_i$ and runs the OPRF to recover the shares $s_i$. The protocol is nice and simple to describe, but there are many subtle issues that arise with defining and proving security for the OPRF used in the construction: see the paper for details.

Another application of PPSS is for threshold PAKE (password authenticated key exchange), which allows a user to securely exchange keys with any subset of n servers using a single password, provided no more than t servers are corrupted.  They prove a generic composition theorem showing that PPSS can be combined with key exchange to give threshold PAKE in just one round, which was never achieved by any prior work.

Wednesday, December 10, 2014

The Legal Infrastructure Around Information Security in Asia

The second invited talk at Asiacrypt strikingly stood out. It was given by Helaine Leggat, an Australian lawyer and information security professional, and it touched upon cryptography only by several mentions of hash functions. The speaker started by elaborating on her motivation to get involved with legal aspects of information security, those being the advent of international cyber warfare and mass surveillance as well as the following shifts in power: from the West to the East, from nation states to the society, and from real assets to information assets. All those are facilitated by the easiness of communication that the internet provides.

For the rest of her talk, Helaine highlighted various elements of the legal framework concerning information security, with a focus on Asia and Oceania. On the international level, there are model laws by United Nations Commission on International Trade Law (UNCITRAL) such as the UNCITRAL Model Law on Electronic Commerce from 1996 and the UNCITRAL Model Law on Electronic Signatures from 2001. Both serve as templates for national law, that is, they are not applicable in their own right. Furthermore, there are conventions such as the United Nations Convention on the Use of Electronic Communications in International Contracts from 2005 and the Convention on Cybercrime by the Council of Europe, which has been adopted by various countries in the Asia-Pacific.

For the national level, the speaker mostly elaborated on laws governing the relationship between governments and citizens with respect to access to information. There are two aspects of this relationship: citizens' access to government information, and the governments' access to information of their citizens. The former is regulated under freedom of information acts in many countries. Those law usually contain exceptions for state secrets.

Even more controversial is the legislation on privacy. It can be seen as being at the core of the trade-off between freedom and protection. Even though there is a lot of commercial interest in personal data, it seems that nation states lead a war on privacy by developing the most sophisticated malware. Moreover, Helaine mentioned historic privacy legislation that makes a difference between telecommunication and broadcasting and that distincts between telephones and computers. In the age of convergence in the form of smart devices, this makes little sense.

Finally, the speaker turned her attention to a more informal kind of legislation such as standards and best practice codes. It it note-worthy that, despite their informal nature, standards are sometimes referred to by laws, for example India's Information Technology act or New Zealand's Privacy Act of 1993, which refers to an ISO standard on privacy.

In the Q&A, Adi Shamir brought up the recent case where the US government asked Microsoft to disclose personal data that is stored in Ireland. Microsoft argues that this data is protected under EU privacy law. In her answer, Helaine pointed to the contradiction between the PATRIOT act and EU law, and to the fact that different parts of the world have different attitudes towards privacy. This makes corporations operating in Europe care more about their customers' privacy there, at least superficially.

Tuesday, December 9, 2014

Large-scale Computation and Exploitation of RC4 Biases

The excellent first invited talk at Asiacrypt 2014 in Kaohsiung, Taiwan was given by Kenny Paterson (@kennyog) from Royal Holloway, entitled "Big Bias Hunting in Amazonia: Large-scale Computation and Exploitation of RC4 Biases". Kenny outlined the progress of himself and his co-authors in analysing the single and double byte biases in the RC4 keystream in detail, and using the resulting information to construct attacks on cryptographic protocols such as TLS and WPA/TKIP.

Keystream bias and plaintext recovery

The fundamental flaw with RC4 is that (at least) the first 256 bytes of the keystream are not uniformly distributed: the probability of each byte taking each of the 256 possible values 0x00,..,0xFF being 1/256, the actual probabilities exhibit small differences (biases) away from this number. I find that the visual representation of these biases gives the best indication of the nature of the problem---check out the graphs here.

These biases are problematic in situations where a fixed plaintext (think cookie, or password) is repeatedly encrypted under new RC4 keys. In the ideal world, the distribution of the resulting ciphertexts for these plaintext should be uniform, but because the keystream is biased, it isn't. This allows an adversary who sees enough of these ciphertexts to use a fairly simple Bayesian analysis to learn about portions of the plaintext. Protocols sitting in this repeated plaintext context include TLS and WPA/TKIP.

The talk went into detail on how these biases can be exploited to attack these two protocols, but for the purposes of this blog I'm going to stick to discussing how these biases were estimated. For more on the actual attacks, have a look at the RHUL homepage, or a couple of blog posts from Matt Green here and here, or our own blog post here.

Generating the biases

The final section (and motivation for the title) of Kenny's talk outlined how the keystream distribution information was estimated. Estimating the distribution is functionally simple: simply generate lots and lots of random 128-bit RC4 keys, record the first 256 bytes of the keystream generated using each key, and sum up the resulting values of each byte (or pair of bytes in the case of double-byte biases) to estimate the probability of each byte value occurring.

The key here is the "lots and lots" part of the above paragraph. The biases are small---the uniform probability would be 0.00390625, and from a rough glance at the estimated distributions a typical deviation might be as small as 0.00001. As a consequence to eliminate the variance in the estimation process and to estimate the distributions with sufficient precision to allow a plaintext recovery attack, a large number of random keys are needed.

How many? In the case of estimating single-byte biases under the WPA/TKIP protocol, the number of keystreams required was 2^48, and for the double-byte biases 2^46 keystreams. Running this kind of computation using standard desktop-level resources is infeasible (unless you have a lot of time on your hands), and so the authors turned to the computational resources available through cloud computing services.

The authors used Amazon's EC2 compute cluster to deploy 128 nodes, each containing Intel Xeon E5-2680 v2 processors clocked at 2.8 GHz, giving a grand total of 4096 Ivy Bridge cores, totalling 8192 virtual cores in the presence of hyperthreading. Kenny gave some interesting anecdotes about issues with managing this amount of hardware, including having to ask Amazon to manually disable controls stopping users from spinning up more than 10 nodes at a time, and maxing out the capacity of the US West data centre.

As well as the computation required to estimate the distribution, storing the distribution is also a challenge---as always, data I/O is lurking in the background ready to pounce just as you think the difficult part is over. The single-byte distribution requires 32 GiB of storage (which I'd assume to be double-precision floating point values), and the double-byte requires a massive 8 TiB. If you're using a remote compute service, then transferring that kind of data onto local storage for further analysis requires significant effort.

Finally, the cost of the computational effort is interesting---clearly, renting compute cycles is cheaper than purchasing the necessary hardware yourself. The total cost was 43K USD; given the total compute time of 63 virtual core years, we can arrive at a cost of around 7 cents per virtual core hour.

The take-home message

For me, a few points stood out as particularly noteworthy. The maxim "attacks always get better" is definitely applicable here: from the initial observation that plaintext recovery might be possible, we've arrived at the point at which widely-used secure protocols can be considered to be broken when used in tandem with RC4. The results from the team at RHUL and elsewhere also demonstrate how you should keep pushing further with investigations into vulnerabilities, and that is certainly worth investing the time and resources necessary to fully experiment with the nature of the vulnerability.

Secondly, we again return to the problems that arise at the point of intersection between applied cryptography and real-world deployments. Kenny talked about the "inertia" of implementations of protocols: from the high of 50%+ usage of RC4 in TLS, we're still seeing a usage of 33%. Having the agility to rapidly patch/update all the necessary hardware, firmware and software to fully eliminate the usage of a broken ciphersuite is clearly very difficult. Moti Yung made a good point about the relative seriousness of these cryptographic flaws---whilst a plaintext recovery attack requiring 2^20-30 TLS sessions is clearly bad and necessitates a fix, the kind of software flaw vulnerabilities we've recently observed in almost all of the most popular TLS/networking stacks are far more impactful.

Monday, December 8, 2014

Bivariate Polynomials Modulo Composites and their Applications

So this week is AsiaCrypt 2014 in Kaosiung, Taiwan and as such this will be one of a series of blog posts appearing over the course of the next few days. Unusually for me I have not chosen a paper on theoretical leakage resilience or anything close to what I usually work on. I have chosen to blog about a talk from the second session "New Proposals" entitled "Bivariate Polynomials Modulo Composites and their Applications" by Dan Boneh and Henry Corrigan-Gibbs, with Henry giving the talk.

So before I start talking about the presentation I just want to recap on RSA and its related problem. Given (equal size) primes $p,q$ we define $N=p\cdot q$, then the RSA problem is given $N$ recover $p,q$ and we assume that this is hard. Certainly we believe that factoring integers is hard and if we can solve the RSA problem, and years of using RSA at a large scale says this does appear to (currently) be a hard problem.

Using the RSA problem we can define a class of problems defined by a function $f$ where given $f(x)\mod{n}$ the goal is to find $x$. If $f(x)=x^2$ we get a problem similar to quadratic resoprocity, while if $f(x)=x^e$ we get RSA and if $f$ is random such that $f(x)=0\mod{N}$ then this is equivalent to factoring $N$.

The question that arose in this presentation is what if the function $f$ is replaced with a bivariate polynomial instead, can we get security and does this have any use? The three notions of security that were looked into were one way-ness, second preimage resistance and collision resistance.

The trick used by the authors is to look at $f(x,y)=c\in\mathbb{Q}$ to understand $f(x,y)\mod{N}$. Since if the problem if easy to solve in $\mathbb{Q}$ then it is easy to solve modulo $N$ (loosely speaking you just convert the answers as you would expect). Currently this is the only method known for solving these but it does raise the question of is it the only method?

Oneway-ness: It can be shown that (in $\mathbb{Q}$) $f(x,y)$ can not be one way if it is of genus 0, it is currently unknown how are (in general) it is to invert for a higher genus.

Second preimage resistant: For $f$ to be second preimage resistant, $f(x,y)=c\in\mathbb{Q}$ must have no non-trivial rational mapping to itself. If there is such a mapping then this mapping can be used to easily break the second preimage resistant property.

Collision resistance: For $f$ to be collision resistant, having $f$ be injective will give this. This gives us that if $f(x,y)$ is not injective then $f(x,y)\mod{N}$ is not collision resistant but it is an open question if $f(x,y)$ being injective automatically implies that $f(x,y)\mod{N}$ is collision resistent.

Before we even consider collision resistant functions, we have to ask if there even exists low degree injective polynomials over the rationals. Well it turns out that this is an open question in Number Theory! On the bright side $f_z(x,y)=x^7+3\cdot y^7$ has been conjectured to be injective over the rationals for 15 years now (can this be proved in the generic ring model?). From this point forward the authors use the assumption $f_z$ is in fact collision resistant and then ask what constructions they can get from this.

A commitment scheme
Commit(m): Choose a random $r$ modulo $N$ and return $c=f_z(m,r)\mod{N}$
Open(c,m,r): Check $c=f(m,r)\mod{N}$
Hiding follows from the fact that $m$ is blinded by a random $r$, while blinding follows from the fact that breaking blinding breaks the assumption that $f_z$ is collision resistant.

They also describe how to make a Chameleon hash and have a really novel use in which using succinct zero knowledge you can prove that 3 Pedersen commitments open to values which are the commitment (and message) of value for one of their commitment scheme values. While zero knowledge "proof that a commitment is of a commitment triple" sounds like a strange use it is actually used in anonymous bitcoin.

To conclude they show that using bivariate polynomials with the RSA problem leads to some interesting schemes and that it is certainly worhty of further study.

Sunday, December 7, 2014

52 Things: Number 9: What are Shannon's definitions of entropy and information?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog post tries to introduce some fundamental concepts in Information Theory: What are Shannon's definitions of entropy and information?


Information Theory was founded by Claude E. Shannon in 1948. It was originally developed to study signal processing but its application has been broadened to various disciplines through decades. This blog is intended to briefly introduce two fundamental concepts, entropy and information. If you are interested in more details, I personally suggest you to find more in [1].


Entropy

Entropy is a measurement to evaluate the uncertainty[3] of one or more variables.

Assume we are investigating the first web page people visit when they open a web browser. We use samples from two groups of people: 4 cryptographer from Bristol Cryptogroup and 4 passenger grabbed at Bristol coach station. Let's make a more ambitious assumption that the cryptographers always visit http://bristolcrypto.blogspot.co.uk/ first when they open a web browser, while the others each visits a different portal.

Now let's evaluate the uncertainty of their answers: apparently, the answers from cryptographers are quite certain (low uncertainty) whilst it can hardly be guessed if an answer is from a passenger (high uncertainty). In other words, we  say the answers in the cryptographers group has a low entropy but  that in the passengers group has a high entropy.

So one of the most remarkable inventions of Shannon's is his definition of entropy(Shannon's Entropy):

\begin{equation}

H = -{\sum}_{i}{p_i \log_b{p_i}}

\end{equation}

where $p_i$ is the probability of a message (an answer in previous example) appears. In computer science, we usually use $b = 2$ (bits).

If we compute the entropy in our example, we will have:

\begin{eqnarray}

H_{cryptographer} = -{\sum}_{1}^{4}{1 \log_2{1}} = 0\\

H_{passenger} = -{\sum}_{1}^{4}{{1 \over 4} \log_2{1 \over 4}} = 2

\end{eqnarray}

So the passengers' answer do have a higher entropy than the cryptographers'!

Information

Formally, the definition of Shannon's information is given in [2] as:

    " Information is a measure of one's freedom of choice when one selects a message."

To explain this, let's make a minor modification to our previous example. Let's grab another 4 passengers from Bristol train station and assume their answers are also random portals just like the passengers' in coach station.

Here is the question: Given an answer $y$, can you tell which group the answer is from?

We can instantly tell that the answer is from our cryptographer group if $y$ is http://bristolcrypto.blogspot.co.uk/. However, we will be struggling if $y$ is a portal; therefore, we could say the message of http://bristolcrypto.blogspot.co.uk/ contains more information (less freedom) than a portal (more freedom).

So how does it relate to entropy?

Extending the definition of entropy, we define the Conditional Entropy as:

\begin{equation} H(Y|X) = \sum_{x \in X}{p(x)H(Y|X = x)} \end{equation}

which describes the entropy of $Y$ when given condition $X=x$. To make it more explicitly, since entropy is the uncertainty of a variable; hence the previous definition of conditional entropy is in fact the uncertainty of $Y$ when given the "clue"(condition) $X$.

Observation: consider two variables $X$ and $Y$. If $X$ contains only a minimal information of $Y$, then given an exact value of $X$ should not help us much on deducing the value of $Y$, that is, it does not obviously reduce the uncertainty of $Y$; on the other hand, if $X$ contains essential information of $Y$, then the entropy of $Y$ is expected to be low when $X$ is given. Therefore, the conditional entropy can be viewed as a rational measurement to the information $X$ has of $Y$!

Another important measurement is called Mutual Information. It is a measure of mutual dependency among two variables. One way to define it is the loss of entropy(uncertainty) when given a condition:

\begin{equation}  I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) \end{equation}



Cryptographic Example
The concepts of information theory is widely used in cryptography. A classic example is to view a cryptographic process as a channel with plaintext and ciphertext being input and output respectively. Research of side channel analysis also benefits from the usage of information theory.



References

[1] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory
    2nd Edition. Wiley-Interscience, 2 edition, July 2006.


[2] S. Vajda, Claude E. Shannon, and Warren Weaver. The mathematical
    theory of communication. The Mathematical Gazette, 34(310):312+,
    December 1950.

[3] http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

Monday, December 1, 2014

52 Things: Number 8: How does interaction help in computation, and what is the class IP?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog post concentrates on the power of interaction in computation and the Complexity Class IP.

To answer the questions, we first give give a brief introduction of the interactive proof systems. As we know, zero-knowledge proofs currently play an important role in the study of cryptographic protocols. This concept was introduced by Goldwasser, Micali and Rackoff in [1]. Such proofs have the fascinating property of revealing nothing other than the validity of assertions being proven. In order to achieve this, Goldwasser, Micali and Rackoff developed the interactive proof systems by adding two ingredients to the classical proof systems. The first ingredient is randomisation, the verification of proofs can be probabilistic and the verifier is allowed to err with small probability. The second one is interaction, the static proof is replaced by dynamic prover who will interact with the verifier to convince it the assertion is true. Combining the classical proof systems with the two ingredients has a huge effect: the set of languages of interactive proof systems is in a large complexity class IP

What is a proof?
Loosely speaking, a proof is a way for a party to convince another party that a certain assertion is true. The two parties involved in a proof system are called a prover and a verifier.

Classical proofs
A classical mathematical proof is a fixed sequence of statements, which can be all written down by the prover then the verifier can easily check the validity of the assertion. There is no interaction between the prover and the verifier

Any proof system should have the following properties: 
  1. Efficiency: the verification procedure can be carried out efficiently. 
  2. Soundness: for any false assertion, invalid proofs are hard to find. 
  3. Completeness: for any true assertion, there exists a proof. 

Recall the complexity class NP can be viewed as a class of languages whose members all have certificates that can be easily checked (the non-members do not have certificates). Hence we have NP is exactly the class of languages of classical proofs. 

Interactive Proofs
In an interactive proof system, the prover and verifier are allowed to interact with each other by exchange of messages. Before introducing the concept of interactive proofs, we first give an example (which can be found in [2]) to show how does interaction help in computation. 

Example: Graph Isomorphism and Graph Non-isomorphism.
Two graphs G and H are called isomorphic if the nodes of G can be reordered so that it is identical to H. We define the language:

ISO = {<G,H>|G and H are isomorphic graphs}

It is clear that ISO is in NP. Even though the number of nodes can be very large, the membership of ISO can be easily verified by given the instructions of reordering.

Then we consider the complement problem of Graph Isomorphism, namely Graph Non-isomorphismWe define the language:

NOISO = {<G,H>|G and H are not isomorphic graphs}

And the question is, using a classic proof, how can a prover convince a verifier that two graphs are not isomorphic? We don't know how to provide short proofs that the graphs are not isomorphic and the verifier can't check every possibilities in polynomial time. Thus, we don't know how to prove that NOISO is in NP. Nevertheless, consider the following interactive protocol, the prover can convince the verifier the fact that the given two graphs are not isomorphic. 

Both the prover and the verifier have a pair of graphs $(G_1, G_2)$ as common input. The verifier randomly choose a random bit $b \in \{0,1\}$ and a permutation $\pi$. Then it applies $\pi$ on $G_b$ to get a graph $H$. The verifier sends $H$ to the prover. Next, upon received $H$, the prover sends $b' \in \{0,1\}$ to the verifier. Finally, the verifier accepts if and only if $b'=b$. 

The idea behind the protocol is, if the given graphs $(G_1,G_2)$ are not isomorphic, then the prover should be able to identify $H$ is from either $G_1$ or $G_2$. However, if the input graphs are isomorphic, even with unlimited computational power, the best choice of the prover is to make $b'$ a random guess. In this case, the prover accepts at most $\frac{1}{2}$.

From the above example, we conclude that NOISO cannot be proved to the verifier via a classic proof, whereas it could be proved via an interactive proof(protocol). We can see there is some power in interaction. 

Now we give the formal definition of interactive proofs and the complexity class IP

Interactive proof systemA pair of interactive machines $(P,V)$ is called an interactive proof system for a language L if machine V is polynomial-time and the following conditions hold:

  • Completeness: For ever $x \in L$,
$Pr[<P,V>(x)=1] \geq \frac{2}{3}$


  • Soundness: For every $x \notin L$
$Pr[<P,V>(x)=1] \leq \frac{1}{3}$

The class IP: The class IP consists of all languages having interactive proof systems. 

By the definition, it is obvious that any language in BPP is in IP. And if we restrict the exchange of messages between the machines to be $1$, then we have NP is in IP. Actually, IP is a surprisingly large class. In 1992, Shamir revealed that PSPACE=IP [3]. 

In addition, notice that in the protocol, the prover has the ability to toss a private-coin. If the prover is allowed to access to the verifier's random string leads to the model of interactive proofs with public-coins, which is related to a similar complexity class AM [4]. 

[1] http://dl.acm.org/citation.cfm?id=63434
[2] http://www.amazon.co.uk/Introduction-Theory-Computation-Michael-Sipser/dp/0619217642
[3] http://dl.acm.org/citation.cfm?doid=146585.146609
[4] http://en.wikipedia.org/wiki/Arthur%E2%80%93Merlin_protocol