Review of  the
Symposium on Security & Privacy,
Claremont Hotel Resort and Spa, Berkeley, CA
May 17-20, 2009

Review by Martin Szydlowski
May 29, 2009

Introduction, by Sven Dietrich, IEEE Cipher Associate Editor

About 260 participants attended the 30th Oakland conference this year to listen to the presentations, to mingle in the hallways, at the reception or poster session, and perhaps for some to skip a talk or two to go to the spa (we know who you are ;)

We are looking forward to the 30th anniversary next year ("anniversary" implies zero-based counting) at the 31st Oakland conference with lots of special events. In the meantime, please enjoy this year's write-up by Martin Szydlowski (UCSB), complemented by additional Q&A capture from Matthias Vallentin (ICIR) and some fill-ins here and there by yours truly.

Conference notes by Martin Szydlowski

Day 1

First Session: Attacks and Defenses
Chair: Tadayoshi Kohno

Wirelessly Pickpocketing a Mifare Classic Card (Best Practical Paper Award) Flavio D. Garcia, Peter van Rossum, Roel Verdult, Ronny Wichers Schreur (Radboud University Nijmegen)

Peter von Rossum presented. He showed how the Mifare classic card, the first card up from a non-crypto RFID card in the Mifare family, was broken. He also showed the timeline of the attack and mentioned the legal tidbits with the manufacturer who tried to halt the publication of this attack. He pointed out that these security problems don't exist at the higher level of the Mifare card (AES rather than an LFSR), but those are more expensive of course. The Mifare classic is in use as transportation fare card, e.g. the Charlie card (Boston), Oyster card (UK), and in the Netherlands, and in some ski resorts in France. At lunch Peter showed a whole collection of such cards, some plastic, some paper, including peeled ones where you can see the RFID chip exposed with its antenna.

Plaintext Recovery Attacks Against SSH
Martin R. Albrecht, Kenneth G. Paterson, Gaven J. Watson (Royal Holloway, University of London)

This talk showed an attack on the Binary Packet Protocol in SSH, causing leaks of plaintext bits. Recent versions of OpenSSH are fixed, and the SSH developers were given some time to fix the flaw. Premise: they can recover 14 bits of plaintext from an arbitrary block of ciphertext with probability 2(-14) and 32 bits of plaintext from an arbitrary block of ciphertext with probability 2(-18).

Exploiting Unix File-System Races via Algorithmic Complexity Attacks
by Xiang Cai, Yuwei Gui, Rob Johnson (Stony Brook University)

This talk showed the breaking of two file-system race condition defense mechanisms. The speaker argued that these attacks were applicable to multiple Unix operating systems and affect most Unix file system races.

One person asked whether a true "atomic" statement in the kernel would resolve the problem, but the speaker was not sure that was doable.

Second Session: Information Security
Chair: Patrick Traynor

The first talk of this session was titled "Practical Mitigations for Timing-Based Side-Channel Attacks on Modern x86 Processors".

As motivation, the speaker explained the idea of side channel attacks in general and then timing-based attacks in particular. As example, he show a snippet of source code from a modular exponentiation algorithm, where, depending on the input, a computationally expensive branch might be taken that will introduce a measurable delay on the execution time of this code segment. He went on explaining, that side-channels can be introduced in every step of the creation of a cryptographic algorithm -- when the algorithm is devised by a cryptographer, implemented by a programmer, compiled and executed on a particular processor. The work presented here focuses on eliminating side-channels during the compilation step by introducing a "side-channel aware" compiler. After explaining the basic workings of a compiler, He explained their approach to eliminating control-flow that leads to run-time differences. A simple and complete solution for this problem is possible on processor architectures that support conditional execution, however, on modern Intel processors only assignment operations (e.g. MOV) support conditional execution while arithmetic instructions do not. To work around this issues, he introduced several techniques to transform code that eliminate timing differences and presented experimental results that showed the effectiveness of the presented solution. He singled out two Intel micro-code features that are problematic to handle: First, the latency of the DIV instruction is dependend on the operands and side-effects (division by zero) need to be taken into account. There are workable solutions, however, they incur a considerable overhead. The second problematic feature is pessimistic load bypassing, which he will try to solve in future work.

The first question asked was if it would be feasible to just compute the worst-case time and introduce a wait in the algorithm. The answer was that the worst-case time is hard to compute and the mechanisms for the wait would be very hard to implement. Asked if a randomized delay would help, he said he doesn't think so. The last question asked was about dead code elimination in the processor and how it affects the presented solution. He answered that with current processors, this does not affect the conditional MOV instruction.

The second talk of this session was titled "Non-Interference for a Practical DIFC-Based Operating System", presented by Maxwell Krohn.

Krohn started by explaining the possible approaches to "Decentralized Information Flow Controls" (DIFC), one being language-based and compile-time the other OS-based and run-time. The focus of Krohn's talk was a proof of security for a DIFC-based OS. He explained how DIFC is enforced through security labels attached to processes and the two approaches used to expose the functionality to applications: implicit "floating" labels and labels that have to be set explicitly by the requesting application. Implicit labels bear an intrinsic risk of information leaks, therefore, Krohn's work focused on proving the security of explicit labels by showing that the property of non-interference (i.e., that one group of processes can in no way interfere with another group of processes) is valid at all times. They modeled the system using the Communicating Sequential Processes (CSP) process algebra and Krohn briefly outlined the proof, referring to the paper for details. He concluded by saying that provable security in a DIFC-based OS is possible.

Q: If processes can change their own levels, how does that affect the model?
A: If a high process adds more tags, still a high proces.


Third Session: Malicious Code
Chair: Ulfar Erlingsson


The first talk of this session was "Native Client: A Sandbox for Portable, Untrusted x86 Native Code".

This paper has won the best paper award this year. The speaker started by outlining the problems with current web applications, namely the performance limits when using built-in browser methods (JS) and the security implications of browser plug-ins. Then he presented Native Client (NaCl), a safe execution environment for untrusted code that is portable to all browsers/operating systems, offers almost native performance and is small and simple. He explicitly stated that NaCl will not execute existing web-app code, instead, a recompilation for NaCl will be necessary. To demonstrate the performance of NaCl, he demoed Quake, a 1996 3D action game running inside a browser window. To facilitate this performance, NaCl has support for many modern CPU features like SSE. The system is open, and the authors provide a gcc toolchain.

The idea behind NaCl is following: One browser plug-in runs all the apps in a process jail that allows only limited interaction with the OS. The safety is comparable to JavaScript, while the performance benefits from native machine code. The threats that need to be mitigated when allowing native code are the use of privileged syscalls and unusual control flow and read/write instructions. Some other threats, like infinite loops, memory leaks and covered channels are not addressed by NaCl in the current version. To ensure control flow integrity, the instruction set needs to be restricted and some instructions need to be restructured. This allows for a reliable disassembly that is required to verify the safeness of the code. Code that fails the disassembly step is automatically deemed unsafe. To constrain data and instruction references x86 segmentation is used which allows for integrity checks without any overhead. The NaCl runtime environment features a subset of POSIX APIs, an inter-module-communication interface, multimedia support and RPCs. Google conducted a public safety test of NaCl, exposing several weaknesses that were fixed in subsequent releases. The biggest attack surface was definitely the browser integration.

There were several questions after the talk: The first was about the validation step in the sandbox and if it's possible to subvert it with a hacked compiler. He answered that the parsing is pretty robust. The second question was about the possibility of self-modifying code. According to the speaker, this is possible within the constraints of the sandbox. Another question was about the complexity of applications under NaCl. He answered that for now, the apps are not allowed network or filesystem access.

Q: What about Adobe Flash player?
A: It has its own network stack and would require significant surgery to incorporate.

The second talk of the session was "Automatic Reverse Engineering of Malware Emulators" (Best Student Paper Award).

In the beginning, the speaker explained the reasons behind and the evolution of malware obfuscation. The overall trend goes towards finer-grained obfuscation, with instruction-level obfuscation being current state of the art. With this method, the malware logic is transformed into an arbitrary byte-code language and executed through an emulator packaged into the same executable. Contrary to previous methods, where a decryption routine would unpack the malicious code to memory and then execute it, malware emulation decodes the instructions one by one as they are fetched and executed. Hence, there is no point in time at which the whole malicious binary is in memory in native machine code form. Of course, the emulator and byte-code can be modified on propagation producing an infinite number of variants of the same code. He went on explaining that static analysis does not work at all on this type of obfuscation, while recent dynamic analysis techniques like multi-path exploration fail to scale well under these conditions. Then he presented their approach to the problem: Automatically identify and reverse-engineer the emulator routines to retrieve a decrypted (transformed back to x86 machine code) version of the malware payload. No prior knowledge of the byte-code or the emulator workings is required, instead the authors rely on the fact that an emulator must contain a CPU-like fetch-decode-execute main loop and an associated program counter (PC). He explained briefly the techniques they use to find the location of the PC and the fetch-decode-execute code and highlighted some implementation details. Finally, he showed the results of several synthetic test, where the authors obfuscated a binary and then tried to analyze it, and test on real-world emulated malware samples. In both tests, the system was able to identify the emulator routine and produce a control-flow graph for it.

The first question was about any further steps that malware authors could take to obfuscate their code. He answered that, as far as granularity goes, instruction-level obfuscation is the end. Another question concerned the applicability of this technique to other types of obfuscation and the answer was that other types of obfuscation require different approaches. The next asker wanted to know if any "innocent" programs have been identified as malware emulators to which He answered that code parsing network traffic exhibits similar behavior and, hence, might cause false positives. The last question was if the system is capable of extracting other semantics beside control-flow from the emulator and he said that will be included in future work.

Q: In this arms race, what's the next step of the attackers?
A: Emulation is the finest granularity of obfuscation, but there is potential for variation.
Q: Do you look at anything beyond Control Flow Graph (CFG) extraction?
A: No.

The last talk of this session was "Prospex: Protocol Specification Extraction", presented by Gilbert Wondracek.

As motivation, Gilbert explained the usefulness of stateful network protocol specifications for applications like fuzz testing, deep packet inspection and comparison of different implementations of the same protocol. Manual reverse-engineering of protocol specifications is a tedious and time-consuming process (e.g., the Microsoft SMB protocol was manually reverse-engineered over many years for the samba project), while the current automatic solutions are only able to deduce the message format and not the underlying state machine. In order to obtain the state machine, Prospex performs four steps: First, network sessions are being analyzed by recording execution traces of applications using the protocol. Dynamic data tainting is used to track the flow of received data through the application. In the second step, message format inference is performed using a Multiple Sequence Alignment algorithm. Then the message specifications are clustered according to 3 features (input, execution and output similarity). The result is a generalized format for each message type. In the final step, a state machine is inferred by first creating a labeled augmented prefix tree acceptor (APTA) and then merging states to create a minimal, yet expressive, DFA that will accept all the observed traces. To evaluate the system, the authors tested in on four real-world network protocols, three text-based (SMTP, SIP, Agobot) and one binary (SMB) and were able to create state machines for all of them. To asses the quality of the generated state machines they measured the parsing success of control datasets, compared it with a reference (manually generated) state machine and pitted it against other similar algorithms. The results in all three test were positive, and Prospex outperformed other existing algorithms in both recall and precision. Gilbert then showed an application example: By using the extracted state machines to create input for a stateful fuzzer, they were able to identify two (previously known) vulnerabilities in Samba and Asterisk. At the end Gilbert mentioned some limitations of the current system: While their implementation only inferred the "server-side" state machine, it is thinkable to do this for bidirectional communication. On the other hand, encrypted network traffic poses a significant difficulty, and the quality of the state machine is directly influenced by the quality of the training data set.

The first question was about how easy it would be to confuse the system and cause a state explosion. The second asker wanted to know if invalid sessions were included in the training set, to which the answer was no. The last question was about the importance of the different similarity scores for detection, to which Gilbert answered that the scores were weighted according to a simple, fixed scheme and did not require tuning for different protocols.

Q: If an attacker creates a program with zillions of states, does this approach collapse?
A: Correct.
Q: Would it be possible to identify the message types, e.g. within SIP?
A: No.

Fourth Session: Information Leaks
Chair: Radu Sion


The first talk of this session was "Quantifying Information Leaks in Outbound Web Traffic".

The speaker started by explaining the threats of data leaks which can occur due to outsider (hacker) or insider activity. In any case, the goal of the attacker is to steal information and avoid detection while transferring it over the network, and to that end, he might use different techniques to obfuscate the data and hide it in legitimate network traffic. To mitigate this threat, the network traffic must be examined and the leaks identified and quantified. There are both on-line (Intrusion Detection) and off-line (Forensics) application cases. In order to make identification of possible leaks easier, he suggested to eliminate all network traffic which cannot be used to leak information and in his approach he focuses on HTTP requests which is the most likely vector in secured enterprise environments. In order to achieve that, the authors propose to compare the actual request with the "expected" request and strip away everything (using edit distance comparison) that matches the expectation, then the rest will be an upper bound on the amount of possibly leaked information. The expected data consists of fields that can be extracted from the protocol specification (e.g. fixed headers), from previous requests (User-Agent, Referer) and from processed HTML pages that originated the request (form fields, links). While the protocol specification is available, and previous requests can be extracted directly from network traffic, HTML pages need to be parsed and contained JavaScript code needs to be executed to obtain the DOM tree of the document as seen by the browser. his proposed system does all that - it uses SpiderMonkey, the Mozilla JS interpreter, to obtain the final DOM tree of the document and extracts all links from there. The authors evaluated their system on constructed and real-world browsing traces. The synthetic test was performed on five different web sites, and only for one, a social networking site, not all links could be retrieved. This was due to the fact that that site created new links on user input events, which were not simulated by the system. The test on real data showed that the presented technique can narrow down the amount of data possibly leaked far more precisely than other methods. However, the performance of the system right now is only suitable for off-line (forensic) analysis, it would need substantial improvements to work with an IDS. He explained that the main bottlenecks now are JavaScript execution and edit distance computation.

The first question concerned the feasibility of using the system on-line for IDS purposes and the answer was, that, as stated in the talk, the performance is not sufficient for this application yet. The second asker wanted to know if the authors have successfully discovered any leaks with their system. He answered that this is not the goal of this project, which is only about isolating the places where leaks might occur.

The second talk of this session was "Automatic Discovery and Quantification of Information Leaks".

In this talk, the speaker presented an automated approach to determine the amount of information from a program input that can be determined by observing the program output. For example, a failed password check reveals that the input was not, indeed, the password. The presented approach is a combination of program verification and information theory and the authors have a working prototype implementation working on programs written in C. The system, called DisQuant, consists of two parts: The first part, Disco, is responsible for partitioning the "secrets" space - the finer the partitioning, the more secrets are revealed. The second part, Quant, quantifies information leaks using combinatorial methods. To implement the system, the authors used off-the-shelf model-checking software and other tools. He used an electronic auction as example how the system identifies information leaks. Finally, he explained that given enough computation time the system can be arbitrarily precise, however for reasonable performance some precision must be sacrificed and that, by under-approximating and over-approximating DisQuant can give guarantees on the bounds of possible leaks.

Asked how bad the performance really was, he replied that for now the system only works on small contrived examples and not on real-world software. The second asker wanted to know if the system might be useful for forensics or reverse engineering and the answer was no, since more information is required.

Another person wanted to know if DisQuant could be used to determine information leaks in crypto systems, to which he answered that the system would definitely over-approximate the leaks. Another question was if the system will give recommendations for improvements to which the answer was plain no.

Q: What about larger programs?
A: At the moment, compute precise program size. Scaling-Up might help
Q: Rob Johnson: what about encryption algorithms?
A: Overappromixation what an attacker can achieve
Q: Does this tool recommend how to refactor code?
A: No, it's a pure analysis tool and does not give any recommendations.

The last talk of the day was "CLAMP: Practical Prevention of Large-Scale Data Leaks", presented by Bryan Parno.

The speaker started be reminding everyone that the amount of data leaks can only increase due to the ever-growing amount of information available online. He identified an architectural flaw in current web services that largely contributes to possible leaks: Web applications consist of a rather secure server platform running usually poorly audited and insecure scripts and application code that have access to all the information stored in the database back-end. Parno argued that the complexity of server hardening or using secure frameworks (e.g. DIFC) hinders adoption. He proposed an alternative solution to this problem: Give each user her own web server with access limited only to that users data. In this way, when an attacker compromises a server, he will fail to obtain any data except his own. He proposed to implement such a system the following way. The user authentication code is taken out of the Web App and into an Authenticator module which is compact and easy to secure. After authentication, the Dispatcher will spawn a web server instance for that user, and the Restrictor will create database views for that server instance that only contain data relevant to that user. For this to work, a set of configuration files needs to be created for each application by hand. Although this seems hard and time-consuming, in practice it is easy to identify the pieces that need to be isolated, since developers will generally create human-readable code and database schemas. He demonstrated the last point showing real-world database schemas where the relevant tables and columns were easily identifiable. He suggested several methods how to implement such a system (e.g., JVM, SELinux) and that they used Xen for the prototype system. Xen has support for "flash" cloning of virtual machines which is still unstable but offers great performance and minimal overhead. For the Restrictor, a SQL proxy was used. He gave a few examples of real-world web apps they have converted to CLAMP and how long it took to achieve that, with most apps ready for use in several hours. Finally, he showed some performance statistics that showed that, while there is noticeable drop in throughput, this can be mostly attributed to the VM cloning process and advancements in that field should mitigate the effects.

The first question was what happens if the webserver is compromised before authentication. He replied that this is not possible since authentication takes place before the web server is spawned. Asked how it is possible to determine what data user want to see, he answered that all that users are supposed to see can be deducted from the application code. The next questions concerned the mechanisms used to restrict database access and the reply was database views. Another question concerned transactions where data from multiple users is involved, for example a funds transfer. He replied that this action should be separated from the web server, with the user only able to issue the request and a separate application handling the actual transaction. Asked what role he sees for the web server, the speaker answered that the web server should be responsible for presenting the data while authentication and database access should be handled externally. The next question was about legitimate uses of aggregated user data, to which the answer was again that views should be used for that. Finally, one audience member commented that views are already the best practice for database access and the speaker agreed, stating that their system will create per-user views on-the-fly and lift the burden from the developer.

Q: If attacker compromises Web server before authentation, a user could masquerade as any user, right?
A: No, the User Authentication (UA) module is external and not part of the webserver.
Q: How to adapt to data that is only visible to a limited set of people?
A: It should be able to encode these restrictions in data access policies.
Q: Query Restrictor: what about SQL code provided by the user?
A: Use database views.
Q: How to handle a banking application where multiple data is touched at the same time?
A: Don't handle the sensitive tasks in the webserver, instead use an external application.
Q: Rob Johnson: what's the webserver's role given that the application is now spread over many different instances? Does it still have a role?
A: Webserver renders data and content.
Q: How to handle aggregate data?
A: Database views again.

Day 2
Fifth Session: Privacy
Chair: George Danezis

The first talk was "De-anonymizing Social Networks", presented by Arvind Narayanan.

The talk started with Arvind giving examples of different social networks and the information they may contain. The examples ranged from AT&T's telephone call graphs and doctor-patient-networks to high school romantic and sex networks. On-line social networks have grown in popularity in recent years and the amount of personal information contained it them is coveted by data-mining researchers and advertisers, among others. To protect user privacy such graphs are usually anonymized before publication.

Arvind then outlined the key contributions of this work: First, he and his co-authors devised a privacy framework to model the storage and distribution of private information and background data, then they developed a de-anonymizing attack on social network graphs and finally they experimentally validated the attack on real-world data.

The presented model states that to obtain privacy, node attributes, edge attributes and edge existence in a social graph need to be protected and further, that anonymity alone is necessary but insufficient to guarantee privacy.

The attack Arvind presented is based on the following assumptions: The graph is anonymized an published and the adversary has access to large-scale background knowledge. Arvind explained that the overlapping of membership in different social networks is a key to the success of the attack and that by means of graph aggregation, where the anonymized target graph containing sensitive information is compared to a non-anonymized but insensitive publicly available graph (auxillary graph). The process is not straightforward, since the auxiliary graph is generally not isomorphic to the target graph and also is noisy, imprecise and large.

The attack works in two stages: First several seed nodes need to be identified in the target graph. Some detailed background knowledge is required to find these nodes, acquired, for example, through phishing or similar attacks. Once a sufficient number of seeds has been identified and mapped to both graph, the propagation phase begins. Using only the degree and number of common neighbors of matched nodes, the target and auxiliary graphs are mapped through an iterative process. What Arvind's team found was that the edge overlap in real-world graphs is around 15%, therefore, they devised many heuristics to improve the matching. Arvind did not go into detail on the heuristics and instead moved over to the evaluation part. The algorithm was tested on graphs collected from flickr and Twitter and had an overlap of 15%. The team anonymized one of those graphs, leaving 160 seeds and they were able to recover (de-anonymize) 31% of the target graph.

The first asker wanted to know if the algorithm would be still effective if everybody had the same number of friends (i.e., all nodes had the same degree), to what Arvind answered that the number of common neighbors is the more important criterion. Another audience member wanted to know if this means anonymity is unachievable on the Internet and the answer was that with the amount of (high-dimensional) data already present on the web it is almost impossible. The final question concerned the difference of the presented work to previous contributions and Arvind explained that they are the first that use a completely passive approach leveraging globally available background information.

The second talk was "Privacy Weaknesses in Biometric Sketches", presented by Koen Simoens.

To motivate his work, Koen explained the move to biometric security that in recent times has been considered as a better alternative to passwords or other authentication tokens. The problem with biometric data, however, is that usually it cannot be reproduced with 100% accuracy, so methods like simple hashes that allow comparisons but are not reversible do not work. Instead a noise-tolerant algorithm to compare similarity is required. The state-of-the-art solution are "fuzzy sketches" and Koen explained the process of creating and authenticating using them. The threat model he presented is the assumption of an inside attacker who, having access to the sketches could potentially be able to combine related sketches and identify the owner. The main contribution of this work, according to Koen, is the definition of security notions for distinguishability and reversibility attacks on biometric sketches and the evaluation of two sketch algorithms under these aspects. Then he explained the fuzzy commitment scheme and the inherent information leakage therein, which can be leveraged for de-anonymization or data correlation (attribute data in different databases to same user) attacks. Koen then briefly explained the details of the attack on the distinguishability of fuzzy sketch schemes and showed that related sketches are always identified correctly while unrelated sketches have a noticeable false positive rate. Finally, he outlined an attack that will reverse the original input from related sketches, which requires solving a linear system.

The first question was about the dataset used for evaluation and Koen replied that they used data from around 100 members of their department, each giving multiple samples. Another question concerned the technical details of the sketch format, in particular the offset that was used in the attack. The answer was that the offset is random for every sketch but stored within, otherwise a successful comparison would not be possible.

Q: Where did you get the data and how large is it?
A: From an industrial partner, 100-200 individuals with multiple samples.
Q: There exists related work in passport schemes, does it apply there as well?
A: Don't know too much about these schemes, it might apply as well.

The final talk of this session was "The Mastermind Attack on Genomic Data", presented by Michael Goodrich.

He started by explaining the motivation behind genome comparisons (e.g., paternity tests), and the two types of comparisons performed: Aligned matching and sequence alignment. Mechanisms like Secure Multiparty Computations exist to perform genome matching in a completely private manner, however, repeated comparisons will leak information. Then Michael explained the mechanics of the Mastermind game and how repeated genome matching can be modeled in similar terms. Since a genome comparison will show which portions of the genome do match and which don't, repeated comparisons with different patterns can reveal the whole sequence that is being compared to. The problem of Mastermind is provably NP-complete, however, this gives a false sense of security since strategies exist that make is solvable in polynomial time. Michael then briefly explained the origins and characteristics of human DNA and then presented the algorithms he devised that allow recovery of the reference string through multiple comparisons, and that are not strictly limited to DNA comparisons. When aligned matching is performed, he uses a divide-and-conquer strategy that iteratively splits the string in smaller pieces, until each part can be matched. While delegating the proof to the paper, he showed that the worst-case bound for the number of queries is indeed polynomial. Then he demonstrated the algorithm for multiple sequence alignment that uses frequency analysis and also proved that the worst-case number of tries is within polynomial bounds. He the further suggested that these bounds could be improved leveraging knowledge of the human genome structure, which for most parts is not a random string. In the end, Michael presented an evaluation performed on real-world DNA (1000 random mitichondrial DNS samples). Using aligned matching, it was possible to guess the whole sequence with about 300 comparisons, while sequence alignment required under 1000 comparisons and explained that the only defense against this attack would be to disallow indexing of the genome.

The next question was if it is possible to retrieve a whole genome database with this method and Michael said that since it is an adaptive attack, it can only reconstruct one sequence at a time. Another audience member suggested that using more info from Biology to combine individual base pairs into sequences could improve the efficiency, to which the answer was indeed, it could.

Q: Peter Neumann: digrams, trigrams would give better results?
A: Yes.
Q: With a small number of queries, is it possible to extract a large amount of data of many people?
A: Not going to be effective.
Q: Each DNA symbol does not appear independently, so why don't use sequences of symbols?
A: Yes, attacks would be even more effective that way.

Session Seven: Formal Foundations
Chair: Vitaly Shmatikov

The first talk was "A Logic of Secure Systems and its Application to Trusted Computing", presented by Deepak Garg.

Deepak started by explaining the main parts of modeling secure systems design. A secure system has certain security properties and the adversary has certain capabilities. To asses the security of a system, often an informal approach is used, where known attacks are attempted against the system, but this does not prove the security against unknown attacks. A logic-based analysis defines a system with properties and an adversary with capabilities and reasoning is used to prove the security under these assumptions. When a proof can be obtained, the system is secure, otherwise the process gives hints on improving the system. Deepak then introduced LS2, a language they devised to reason about the security of systems. Their adversary model includes full local and network access but no ability to break cryptographic protocols - this enables him to carry our many common attacks. He then demonstrated the use of LS2 to verify security properties of Static Root of Trust Measurements (SRTM), a protocol to remotely verify the integrity of a Trusted Computing Platform, and found two new and one previously known weakness. Deepak then dove into details how SRTM is modeled with LS2 and how a proof is obtained or weaknesses discovered.

The first question referred to the SSH attack described in a previous session and if it could be predicted with LS2. Deepak replied that this is out of the scope of their current work. The next asker wanted to know what their analysis of TPM resulted in to which the reply was that they have uncovered some weaknesses in the SRTM protocol.

Q: Would the SSH attack we saw earlier be detectable?
A: Out of scope of our work.
Q: What's the result of your model?
A: A precise characterisation of the properties of PCR.
Q: TPM 1.2 has dynamic root of trust?
A: Correctness proof available. We have accounted for that version and a version that breaks SRTM.

The second talk was "Formally Certifying the Security of Digital Signature Schemes", presented by Santiago Zanella-Beguelin.

(No notes for this talk)

The last talk of the session was "An Epistemic Approach to Coercion-Resistance for Electronic Voting Protocols", presented by Tomasz Truderung.

He opened by explaining that e-voting protocols are the most complex protocol since they exhibit contradictory requirements. He explained the mechanics used in current e-voting systems and the need for a voter-verifiable paper trail. However, this opens up the possibility for voter coercion that does not exists with traditional voting methods, since now the coercer has the means (the paper trail) to verify that the voter adhered to his wishes. This type of coercion is difficult to define formally since cryptographic definitions are secure but hard to use while symbolic definitions are easier to use but have weaker security. The goal of Tomasz' work is to create a general, intuitive and simple definition of coercion resistance. He then explained the roles in a coercion system - the voter, the coercer and the (honest) environment and defined coercion-resistance as a strategy that will achieve the voters goal while being indistinguishable from the strategy required to achieve the coercers goal. In order to make the proof work, some outcomes of the election (e.g. the coercers desired candidate gets zero votes) must be excluded. Tomasz then dove briefly in to a more formal definition of the model and then stated that the system also works for multiple coerced voters since that can be reduced to a one-voter problem. He continued with the evaluation of three e-voting systems and singled out CIVITAS as an example. With the presented model, Tomasz was able to prove that CIVITAS is coercion-resistant after the registration phase, however, coercion is possible before registration is completed.

Asked if their model can handle invalid votes and intentionally spoiled elections, Tomasz responded that this was not considered. Another audience member pointed out that the slides lacked some logic details, and Tomasz suggested to talk off-line. The final question was what makes the approach epistemic. The answer was that they did not take probabilities into account.

Seventh Session: Network Security
Chair: Jonathon Griffin

The first talk of this session was "Sphinx: A Compact and Provably Secure Mix Format", presented by Ian Goldberg.

Ian addressed the problem of sending messages without revealing the senders identity. Using one trusted relay is not enough, you require a path of relays. While onion routing (e.g., TOR) can do that it works in real-time, which might be required for certain applications but is vulnerable to traffic analysis. Mix-nodes, in contrast, use batch processing of messages that can defeat traffic analysis and are excellent for asynchronous communication like e-mails, blogging, etc. The focus of Ian's talk is on the message format used by Mix-nodes, since it can leak information if not implemented correctly. A desired property of a Mix protocol is the ability to reply to a message without knowing who sent it and to ensure anonymity a reply must be indistinguishable from a regular message. Ian then gave a historical overview of Mix protocols and their strengths and weaknesses, and then introduced Sphinx, a new compact mix protocol with indistinguishable replies and provable security which none of the previous protocols have achieved. Ian presented the message format used by Sphinx in an illustrative diagram and then outlined the proofs for security, indistinguishability and compactness of Sphinx.

The first question concerned the integrity checks of the payload, to which Ian replied that they are built in and changes to the encrypted payload would randomize the contained data. The next asker wanted to know if there is room for improvement, and the answer was that the only possible improvement is a further reduction of overhead.

Q: Rob Johnson: Is the integrity only protected by a large block cipher?
A: (Yes)
Q: Security analysis in the Random Oracle Model?
A: (Yes)
Q: Does your work represent the last iteration since we have all desired properties?
A: Reducing the size is still possible.
Q: How were the ECC coordinates represented?
A: Only the x-coordinate was used.

"DSybil: Optimal Sybil-Resistance for Recommendation Systems", presented by Haifeng Yu.

The problem Haifeng outlined is that on-line recommendation systems as used by Amazon, Youtube etc. can be easily subverted by influencing users or by using a Sybil attack - creating a large amount of fake identities that will alter the recommendation.

To defend against Sybil attacks, one could tie on-line identities in a stronger fashion to real identities but that would raise many privacy concerns. Resource challenges (e.g. CAPTCHAS) can be tackled by botnets and social network based defense has proven to be insufficient. Recommendation-based systems have a very low tolerance to Sybil attacks. To mitigate this problem, Haifeng proposed to use an ancient idea - trust history and presented DSybil, an recommendation system based on feedback and trust with provable guarantees on the worst-case number of Sybils required to subvert the system. DSybil has been tested on a 1-year trace file from In order to make the system workable, several subtle aspects need to be take into account, for example, how to assign initial trust, how to grow it, how to prevent participants from gaining trust for free and how to filter out non-helpful votes. To achieve that, the team leverages the typical voting behavior of honest users. So far, the system only supports good-bad choices but it is thinkable to extend it to a 5-star rating system or similar. The system is modeled as following: Only "good" votes a considered, since negative recommendations are no help and only one vote per object per identity is permitted. Then Haifeng explained the voting algorithm and how users can gain or loose trust. One important part is that each user gets a "personalized" recommendation based on guides - users with high trust that have similar taste, i.e. voting history. It is necessary for the algorithm that the guides cover a large portion of good objects while the number of guides does not need to be big. Haifeng presented an evaluation performed on a one-year Digg trace containing 0.5 million users and showed that three guides are sufficient to cover 60% of good objects and after removing the top three, the next five are good enough. Trying an attack with 10 billion Sybils vs. 1000 honest voters, new users only received bad recommendation 12% of the time and that number dropped to 5% after one week of using the system.

The first question was which assumptions were made about the attacker and Haifeng responded that the attacker can know everything, including the algorithm and still be unsuccessful. The next asker wanted to clarify if each user has different guides, which is indeed the case. The last question was if the guides reputation can be destroyed by always voting against them, to which the answer was that this will affect only the users trust to her guide and have no global effects.

Q: What assumptions are you making about attacker nodes? Are they allowed to look at the first 100 users?
A: The attacker is allowed to have knowledge about everything.
Q: For different users, do guides have to be different?
A: Yes.
Q: Is trust only relative to Alice?
A: Correct.
Q: In practice, does Alice need to provide feedback to the system?
A: Correct.

Eighth Session: Physical Security
Chair: Farinaz Koushanfar

The first talk of this session was "Fingerprinting Blank Paper Using Commodity Scanners", presented by William Clarkson.

William explained that every sheet of paper is unique, due to the material and the manufacturing process. Hence, it is possible to devise a sort of Biometrics for paper. This can be useful to, for example, verify the authenticity of documents, tickets or money. The desirable properties of any proposed scheme should be accuracy, resilience to material wear, non-modification, cost efficiency and security against forgery. William proposed the following scheme to obtain a document fingerprint: The texture of the paper is measured using an of-the-shelf scanner that scans the sheet at 1200dpi with 16bpp. By adjusting the contrast of the resulting image, the paper texture is revealed. Four scans (each rotated by 90 degrees from the previous) are required to compute the surface normals. Then 100 randomly selected patches of 8x8 pixels are used to compute a 3200 byte feature vector. The random seed, together with some error correction bits and a fuzzy hash of the feature vector are stored as the sheet fingerprint. To verify the documents, the same steps are repeated and the fuzzy hash is compared to the one from the fingerprint. William presented an evaluation which showed that each sheet can be recognized with a 3% error rate. After scribbling on the paper with a pen, the error rate rises to 9%. When the paper has been printed on, assuming an ink coverage of 15%, the error rate climbs to 18%. Soaking the sheet in water and ironing it or crumbling it yields a comparable error rate. As one implication that has both positive and negative side-effects, William singled out (traditional ballot) voting: On the positive side, it is possible to verify that the ballots that are counted are the same that have been given out, on the other hand, it destroys anonymity by allowing a connection between ballot and voter. A positive example for e-voting is that the paper trail can be tied to the electronic vote.

The first asker wanted to know how the technique works on glossy paper and William answered that due to specular reflections the measurements are less precise. Asked if it is possible to use a scanner different from the one the fingerprint was created on for verification to which the answer was that the error rates remain the same. Another audience member wanted to know if this will work when pieces of paper are missing. William replied that that should also work within certain bounds. The next question was if a mosaic attack - piecing one sheet of paper from several different sources - could be successful. The answer was yes, it could possibly fool the system but would definitely fail human inspection. The next two questions were about the recognition rate on pages printed with 100% covering rate - in this case the system fails and although William's team did some attempts of scanning through the ink they were not very successful. The final question was if they tried already pre-printed images, and the answer was that they did not experiment with that.

Q: What happends with glossy paper?
A: Idea should work in principle, but it is more difficult.
Q: What is the manufacturer difference when creating the fingerprint on one scanner and verifying on another?
A: 10-12% difference across different scanner manufacturers.
Q: What about schemes where only a part of the paper is available? What about sampling patches from many documents and clipping them together?
A: We believe this is quite difficult and noticable when a human is in the loop.
Q: What about papers where the majority is covered with ink?
A: Fully colored papers are problematic because the surface changes and thus the error rate.

The last presentation of the day was "Tempest in a Teapot: Compromising Reflections Revisited", presented by Markus Duehrmuth.

Markus started by referencing previous work where his team was able to read contents of a monitor without a direct line-of-sight on the screen, using reflections in stationary glossy objects, for example, a teapot. Then he claimed that removing all such objects still does not guarantee security since it is possible to retrieve some information from reflections in the eye and even from diffuse surfaces like clothes or a wall. First, he described the challenges of reading the reflection from a human eye the target is smaller and moving, which makes it difficult to obtain a sharp image. Hence, a technique borrow from astronomy called image deconvolution is used to compensate for out-of-focus blur due to the small depth-of-field and motion blur caused by the exposure time. Through deconvolution, a sharp image can be restored from a blurry image and a point-spread-function (PSF). The challenging part is obtaining the PSF and there are two possibilities: An "in-field" measurement obtained while taking the actual image that would exploit some stationary point light source or an off-line measuremen under controlled conditions. While the first method is good for measuring motion blur it exhibits substantial noise. The second method is accurate but does not account for motion blur, however, it generally produces better results then in-field measurement. Then Markus demonstrated some images reconstructed from reflections on the eye using the different methods. He then went on to explain how images can be extracted from reflections on diffuse surfaces. It is necessary to have two images - one image of an empty screen and one with the text - with this method it is possible to barely reconstruct letters of 10cm height - far bigger than a regular on-screen font. The good news is, however, that it is not possible to do it any better, and hence, removing all glossy object from the room should make it impossible to spy on you without direct line-of-sight. Markus also gave a theoretical proof of this assertion through analysis of diffuse deconvolution. As addition he showed that using these new techniques on stationary glossy objects makes the screen readable from a great distance. Finally, Markus mentioned some possible countermeasures with only one, a notch filter, showing promising results but at a high price. The first question concerned the application of this technique to small devices like cell phones to which the reply was that such devices are suited neither for capturing the image nor for being spied upon. The next asker wanted to know if any information can be gained from IR emissions of the screen and Markus answered that they did not investigate that. Asked if glasses make it easier to reconstruct the image he answered that it should definitely be easier, however, they only experimented with glasses on the table and not on a persons head. The final question was if the method works on text in a reasonably-sized font or printed on a piece of paper to which the reply was that this works only on stationary objects like teapots.

Short Talks Session

The first two talks were brief announcements for the Cloud Computing Security Workshop 2009 and the Trusted Infrastructure Workshop.

* "On Voting Machine Design for Verification and Testability" by Cynthia Sturton (UC Berkeley). She briefly explained that testing of voting machines only guarantees the absence of certain bugs, while formal methods are difficult since it is hard to formalize user expectation. Cynthia then proposed a hybrid approach where the design is formally verified and then tested by humans.

* "Using Java for a Forensically Secure Encryption" by Cedric Jeannot (Louisville). He mentioned a case study where it is possible to retrieve a temporary file from an encrypted file system and suggested to apply security principles in the context of forensics. Jave is perfect for that since it's sandboxed, designed with security in mind and available on all platforms. Cedric's goal is to have Java Apps leave no traces on the system. There are three categories of traces that need to be considered: output, disk and memory and so far, he has solved the first two.

* "Bringing Geographic Diversity to IEEE S&P" and was a not entirely serious talk about introducing a West Oakland conference. * "Membership-concealing overlay network" by Eugene Vassermann (Minnesota). He proposed an overlay network where participants do not disclose their membership, which is an orthogonal problem to anonymity and censorship resistance. The reasons for concealing ones membership in an overlay network might be that the software is illegal in your region. Eugene then listed his design goals to withstand attacks against current systems like passive harvesting, bootstrapping and celebrity attacks. At the end he mentioned that he has an initial DHT based design.

* "Acoustic Side-Channel Attacks on Printers" by Michael Backes (Saarland). He showed that is is possible to recover the printed text from the sound made by a dot-matrix printer. Such printers are still widely used by doctors and banking institutions and certain laws in Germany require narcotics prescriptions to be printed using this technology only. Michael's team created a training set of dictionary words and extracted features using HMMs using off-the-shelf hard and software. They were able to recover 63% of printed words and almost 70% after some post-processing. They have also conducted a successful in-field test in a noisy doctors office - however since the relevant noises are in the 20-40kHz range, human voices do not interfere with the process.

* "Secure Web Templating" by Adrian Mettler (UC Berkeley). He motivated his talk by listing the numerous templating languages used for web development, like PHP, JSP or ASP and that it is easy to write buggy code using them since different escaping functions have to be used on different elements. He suggested a structure-aware templating language that is sensitive to context and auto-escapes certain variables accordingly.

* "A Physics for Digital Information Systems" by, Fred Cohen (California Sciences Institute). He suggested to define physical approximations for information processing to teach to CS students and outlined some differences between classical physics and the information world.

* "Encoding Information Flow Types Using Authorization Logic Constructs" by Limin Jia (UPenn).

* "The Security Through Environmentalism Theorem" Steven Greenwald ( His goal was to increase security awareness of executives by stating and proving that "bad computer security harms the environment". The proposed proof goes in the line of: bad security causes longer runtime of machines due to maintenance which causes more C02 production which is harmful to the environment.

* "A Functionality for Symmetric Encryption in Simulation-Based Security: Joint State Theorems and Applications" by Ralf Kuesters (U of Trier).

* Darleen Fisher from the National Science Foundation presented the "NSF Future Internet Design Program" and invited interested researchers to participate. The program has three phases, first the design of network architecture elements, then a full scale "future Internet" design and finally evaluation.

* "Optical Scanning of Paper Election Ballots" by Arel Cordero (UC Berkeley). He explained that current voting systems are composed of many parts that are mostly proprietary. He proposed an independent system that would only require to scan the ballots and then, using only computer vision (and with no prior knowledge of the ballot) extract all information necessary to correctly count the ballot. Asked why this method shold be more trustworthy than existing system, Arel responded that this system is completely independent and can be used for verification.

* "WOMBAT: Worldwide Observatory for Malicious Behavior and Attack Threats" by Sotiris Ioannidis (FORTH). He explained the goal of WOMBAT: Building a network of network sensors to observe attacks as they are happening. Recorded traffic is than fed to different analysis tool (Anti-virus, dynamic analysis) to enrich the data, identify the root cause of the attack and build even better sensors. He showed a list of institutions already participating in WOMBAT and invited interested companies to join. All partners of WOMBAT get open access to all the collected data.

* "Application of 3-D Integration to hardware trust" by Ted Huffmire (Naval Postdoc School). He presented a novel 2-layer design for TPM chips that separates the computational plane from the control plane making the design supposedly more resistant to attacks and tampering attempts.

* The final talk of the day was about the "Security 2.0 Strategic Initiative" by JR Rao (IBM Research).

Day 3
Ninth Session: Web Security
Chair: Sam King

The first talk of the day was "Blueprint: Robust Prevention of Cross-site Scripting Attacks for Existing Browsers", presented by Mike Ter Louw.

Mike opened by reminding everyone that Cross-site Scripting (XSS) attacks are still wide-spread and showed some recent examples that affected the New York Times and Twitter. He briefly explained the mechanics of a XSS attack and pointed out that the main cause for XSS vulnerabilities is poor sanitation of user input. Mike presented the main goal of his work to be creating a robust defense against XSS that would still allow users to format their input using HTML markup while preventing the execution of scripts injected by malicious users. In order to successfully do that, browsers must not execute code in parts of the page that should contain only data. Todays browsers have no provisions to enforce that policy and would require extensive modifications to support it. The most popular defense today is server-side sanitation of "dangerous" HTML constructs, however, since different browsers interpret the same HTML differently there is a "parsing gap" that allows for possibly harmful markup to remain unsanitized. Mike proposes a solution to the parsing gap - to enforce the way the server parses HTML in the browser. To realize this, a trusted script inserted into the page, will act as parser for the remaining scripts in the page. To prevent the browser from executing the other scripts, they are translated into a safe alphabet that will prompt the browser to treat it as text. Mike then presented a short example how a regular web page containing scripts is instrumented with Blueprint. Then, he explained other possible XSS attack vectors apart from scripts, namely Cascading Style Sheets (CSS) and Uniform Resource Identifiers (URI). The Internet Explorer implementation of CSS supports an "expression" syntax that allows execution of JavaScript code in style element properties. Blueprint is able to mitigate this attack by embedding a trusted script whenever the browser request a style property. URI-based XSS attacks are possible due to the "javascript:" scheme that will execute the script when the resource of that URI is requested. Unfortunately, there is no browser API to enforce an URI whitelisting scheme but Mike suggested several workarounds employed by Blueprint. Then he showed an evaluation of Blueprint under the aspects of effectiveness, compatibility and performance. Blueprint has been tested on 8 of the most popular browsers, constituting a 96% market share. To test different attacks, the freely available XSS cheat sheet containing 94 XSS attacks has been used. Without Blueprint, all attacks work on all browsers. Two popular web applications, WordPress and MediaWiki have been instrumented with Blueprint and apart from the imagemap tag used by MediaWiki that was not in Blueprints whitelist, the success rate was 100%. The overhead induced by Blueprint (latency and memory use) is noticeable, however, it could be improved by removing the now obsoleted built-in sanitation routines from the web apps. The impact on user experience is less pronounced.

The first question was if it is even possible to insert executable code using MediaWiki's markup language. Mike replied that it does not matter because untrusted markup will result in untrusted HTML code. The next audience member wanted to know how they determined the set of safe API calls that Blueprint uses and the answer was that they composed a minimal set using experimental analysis under the aspect that the API calls will not induce the browser to parse additional code. Another person wanted to know if they have used any undocumented features of the API, however the answer was not clear to me. The last question was why Blueprint does not run on the server side only to which the answer was that the current approach makes sure that no client-side differences can introduce vulnerabilities.

The second talk of the session was "Pretty-Bad-Proxy: An Overlooked Adversary in Browsers' HTTPS Deployments", presented by Shuo Chen.

As motivation, Shuo questioned the security of HTTPS and if implementations on different browsers are consistent. He then introduced a new class of browser vulnerabilities that are caused by the proxy capabilities of these browsers. He noted that the disclosed vulnerabilities are being address already. Shuo introduced the "Pretty-Bad-Proxy" adversary, that is capable of breaking the end-to-end HTTPS security guarantees without breaking any cryptographic scheme. The vulnerability lies not in the HTTPS protocol itself, but in it's integration in the browser. Specifically, the attack targets the rendering modules which lie above the HTTP/HTTPS layer. Then Shuo explained and demoed the different attack schemes possible with PBP. The first attack involves injecting HTTP 4XX or 5XX error messages that, although they are sent unencrypted, are rendered in the context of the HTTPS-encrypted web site and allow XSS-style script injections through iframes. The second attack involves HTTP redirect (3XX) messages. Since a script that is referenced by a URI in a web page will execute in the context of that page even when loaded from a different server, it is possible for PBP to redirect a script request to any malicious URL. Another attack involves pages that are designed for HTTP but can also be transferred through HTTPS - they usually will contain resources refered to by HTTP URIs and browsers have security mechanisms that warn the user if non-HTTPS content is loaded in a HTTPS page. However, these safeguards only work for the top-level frame of the document, so these things go unnoticed when contained in an iframe. This allows the attacker to inject into unencrypted traffic an iframe pointing to a HTTPS-protected portion of the website and steal information from there. The final attack Shuo demonstrated allows, using only static HTML, to render arbitrary content in the browser while displaying a valid security certificate for any website. It works by letting the browser download the certificate and cache it, then trigger a refresh and inject an error page (HTTP 5XX) containing malicious content, that will be display in the browser as valid HTTPS page bearing the correct certificate. According to Shuo, all these attacks are highly feasible since proxies are used everywhere and when their integrity is compromised so is the security of HTTPS. Additionally, when the attacker has physical access to the victims local network, he can simply sniff and inject the traffic without the need of compromising the proxy machine. The tests conducted showed that the vulnerability is present in all networks and with all proxy configuration. Ever since they discovered the vulnerability, Shuo's team was cooperating with browser vendors to mend these vulnerabilities and he presented a table detailing which vulnerabilities have been fixed by vendors up until now. For most vulnerabilities fixes already exist, the hardest one to solve though seems to be the HTTP-Intended-but-HTTPS-Loadable (HPIHSL) problem. To mitigate the vulnerabilities until all are fixed, Shuo suggest not relying on session layer security but to use encryption at a lower level, for example IPSec, WPA or VPN. In the future, Shuo's team will try to find if oder protocols exhibit similar flaws and to conclude, he stated that HTTPS is not flawed by itself, however, it is made vulnerable through bad deployment in browsers.

The first question was if a manually configured proxy will mitigate the attack to which Shuo answered that then the sniff & inject attack will still work. The next question was how the browser vendors are fixing the issue and the reply was that now, all non-200 OK responses are discarded before a SSL handshake is completed. Another audience member asked if there are any server-side fixes to which the answer was that this issue is easier to handle on the client-side. The last asker wanted to know which browsers have proxy detection enabled in their default configuration and the answer was all but FireFox, however, the latter will try to determine if behind proxy upon installation and base the default setting on that.

Q: Ian Goldberg: How does this work with manual proxy configuration?
A: You need to hijack the TCP connection.
Q: Can we address these issues on the server side?
A: Server is not in the position to determine whether a proxy exists.
Q: David Wagner: which browsers have automatic proxy detection turned on by default?
A: IE has, Firefox has not.

The last talk of this session was "Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves", presented by Juan Caballero.

He started by explaining the Content Sniffing Algorithm (CSA) that browsers use to determine the MIME-type of received data. This algorithm is necessary since some servers tend to send false or missing MIME-types, while the browser needs the correct type to choose the appropriate handler for the content. The danger lies in the possibility that the browser might interpret plain text or another file type as HTML which in turn means executing all the scripts that are contained within. Juan then outlined the Content Sniffing XSS Attack, in which an image with embedded HTML and JavaScript is uploaded to a Wikipedia page. The image is recognized by the Wiki content filter as image, however, the content sniffing algorithm in the browser will discover the embedded HTML, change the MIME type and execute the contained script. Then he presented their contribution, which is extracting models of browser content filters to automatically craft attacks and upload filters to counter them. Since some of the browsers are only available in binary form, they developed a novel string-enhanced white-box exploration technique. To find attacks they model the website's content filter as boolean predicate, the browsers CSA as multi-classifier and then use a query solver. With this technique they explored the content filters for MediaWiki and HotCRP with CSA's from all major browsers. Juan took a quick detour to highlight the string-enhanced white-box model extraction which increases the coverage per time of path exploration for string-heavy programs. He then demonstrated two attacks that the system was able to discover and explained that the main problem lies in the varying signatures used to determine the content type. Especially in Internet Explorer and Safari, the HTML signatures are far too permissive in that they allow leading junk before valid HTML. To mitigate the problem, disabling CSA is not a solution since this would break approximately one percent of HTTP responses, which is a lot considered the amount of content server through HTTP. Instead, a trade-off that exchanges slightly reduced compatibility for increased security needs to be devised. For once, stricter restrictions on permissible content types can be enforced, and on the client side there is the option of avoiding privilege escalation (never upgrade non-executable to executable) and using prefix-disjoint signatures. To evaluate the compatibility of this solution the team implemented a content filter adhering to these principles and tested it on the Google database containing several billion pages. They deduced that most pages sending an invalid content type are sniffed correctly by their algorithm. The CSA they proposed has been fully implemented in the Google Chrome browser and partially in Internet Explorer 8 and is standardized in HTML 5.

The first question concerned the handling of text/plain MIME types in Chrome to which the answer was that Chrome does not treat text/plain differently from text/html. The second question was if the CSA is also used on images embedded in the web page using the IMG tag to which the answer was yes.

Q: What happens for an image that is embedded into an html page vs directly typed in the url?
A: CSA is still performed, independent of the location.

Tenth Session: Humans and Secrets
Chair: Michael Backes

The first talk of this session was "It's No Secret. Measuring the Security and Reliability of Authentication via "Secret" Questions", presented by Stuart Schechter.

Stuart opened by reminding everyone of the recent wide-published story about the compromised e-mail account of Vice-Presidential candidate Sarah Palin. Then he showed a study from 1990 about the guessability of secret questions by a persons close friends or relatives. The authors of that study used 20 questions based on facts or preferences and it turns out that the correct answer can be guesses almost one third of the time, while one out of five participants forgot the answers to his question within three months. Stuart then examined the security questions used by the top four webmail providers (GMail, Hotmail, Yahoo and AOL) and conducted a study of his own where people together with their significant others or acquaintances where asked to guess the other persons answer. They used several incentives to make people answer the questions truthfully, like they would do by signing up for a real service. After 3-6 months a follow-up study was conducted to measure the recall of the participants, again using incentives to motivate them. In brief, the results were that on average 20% have forgotten their answers while 22% where guessable by people close to them. The exact numbers for different types of questions can be seen in the paper. The collected data also allowed Stuart's team to conduct a statistical guessing attack which was successful 13% of the time. By showing some more statistics, Stuart reinforced his point that there are no good `security' questions. Some services offer the possibility for users to write their own questions, however, most people choose really poor questions with the number of possible answers often less than 5 while the some sites allow five attempts at answering the question correctly. Self-devised question are also easily guessable by acquaintances and on the other hand, the questions themselves might reveal too much personal information. Stuart proposed a scheme to prevent guessing attacks by dynamically updating the number of tries when the answers are related and reducing them when they seem random, however the implementation of that is not trivial. The alternative of e-mail based authentication is not viable for e-mail providers since not many people have reliable alternative addresses. Further suggestions of alternative backup authentication schemes went in the direction of printed shared secrets, SMS-based authentication or social network-based authentication.

Asked what he considered secret enough, Stuart answered that a baseline is hard to find and depends on the application. Another asker wanted to know if people in the study tried to research the answer on the web, to which the answer was that the initial group of test subjects did not try to research the answer, so the following groups have been informed about that option. The next question was if the participants answered the questions truthfully, to which Stuart replied that the answers were manually inspected and for the most part the answers were honest. The next question concerned people re-using their password on multiple sites and if that changes anything and the answer was that even then people forget the answers and get compromised. The final question was if the study showed that security questions are less secure than passwords to which the answer was a definitive yes.

Q: What would be secret enough in terms of percentage?
A: Enough depends on your own use. For example, you might want different security for your throw-away email account vs. a secure service that stores data in the cloud.
Q: Do you look into how these questions can be researched?
A: We told people to look at Facebook and search engines.
Q: Did you find that the answer had to do something with the questions?
A: Just by eyeballing over the data, most answers are indeed related to the questions.
Q: Did this study give insight about the relative strenght of secrect questsions vs. passwords?
A: As one would expect, secret questions are significantly less secure.

The last talk of the conference was "Password Cracking Using Probabilistic Context-Free Grammars", presented by Matt Weir.

The goal of Matt's work was to build a better password cracker to aid law enforcement in a forensic setting. The process of cracking a hashed password to create candidate passwords, hash them and compare to the hashed version. The most resource intensive part is the hashing, however, by using more educated guesses to create the passwords, the number of required hash operations can be reduced. Matt proposes to create a grammar based on previously discovered passwords that will reflect the way people usually choose and construct their passwords. The grammar would allow to rank and order the guesses and apply several mangling rules at once. Matt then mention the current state-of-the-art password cracker. John the Ripper, to which his approach will be compared. His method is based on a dictionary and advanced mangling rules and works in two stages. First, the system is trained on publicly available password lists and password structure rules are inferred, splitting the passwords in letters, digits and symbols and recording the length of each class. Then probabilities for each terminal value are learned. By following the grammar rules and combining probabilities, a score for each password can be computed. Of course the grammar can be adapted to a specific language or social group. The function that chooses the next password to try has to fulfill several criteria: Discard duplicates, return passwords in probability order, be memory and time efficient and allow distributed cracking. Matt mentioned that John the Ripper word mangling rules can be modeled entirely with this grammar and are only a small subset of what is possible with this system. From the training data they had, Matt's team generated 1589 base rules which together with a dictionary can be expanded to 34 trillion passwords. Then Matt explained the cracking process and how the list of passwords to try is generated. In a comparison with John the Ripper over the MySpace password dataset this system wins in all aspects (time, number of cracked passwords, etc.). In the end, Matt briefly mentioned related work and emphasized that this system is the first that can devise word-mangling rules automatically. Asked if people use more secure passwords for banking sites compared to, for example, social networking sites, Matt answered that they did not posses bank password lists to try their system on. The last question concerned the reaction of the Institutional Review Board to this research and Matt said that it took a while to convince the IRB and promises were made to minimize the privacy breaches, for example, user names obtained with the passwords were discarded. Q: Could you observe that banking passwords are more secure than those of social networking sites like MySpace?
A: We didn't have a list of banking passwords, but (hashed) CC data.
Q: Did you run into any IRB issues? Laws? Ethical problems?
A: Hopefully not, but there are indeed a lot of ethical issues. For example, we stripped all user information in our lists to prevent identity disclosure.