Web 2.0 Security and Privacy 2010 May 20, 2010, Claremont Hotel, Berkeley, CA, by Sruthi Bandhavkavi

Review of the
Web 2.0 Security and Privacy Workshop,
Claremont Hotel, Berkeley, CA
May 20, 2010

Review by Sruthi Bandhavkavi
July 22, 2010

The workshop started with opening remarks given by Larry Koved, the workshop co-chair.

Keynote: Jeremiah Grossman was the keynote speaker. He talked about his experience as the CTO of WhiteHat Security where he gets to meet with enterprises on a day-to-day basis to discuss web security challenges. He said that ten years back, he could not get anyone to hear about web security, which is so different from the atmosphere now.

In his keynote speech, Jeremiah talked about the WhiteHat website security statistics report on the topic, ''Which Web programming languages are most secure?''. The statistical analysis was done on nearly 1700 websites that are under WhiteHat Sentinel management from the year 2006 to 2010; some of them required human testing. They tested for about 24,000 uniquely identified vulnerabilities. They tested for several metrics like which classes of attacks are the programming languages prone to the most, how do they fare against popular vulnerabilities, are all the programming languages similar in their secure/insecure behavior? Jeremiah presented several interesting statistics about the different programming languages. For example, Perl has the maximum number of attack surfaces(inputs, POST data, cookies), while .NET is twice as good as Perl in this metric. 73-88% of all the websites had at least one serious vulnerability at some point of time and currently, 53-75% of all websites have one serious vulnerability. He also presented statistics about the average time it takes the developer to fix a vulnerability. In his experience, syntactic issues are fixed quicker than the business logic bugs. He concluded the talk by saying that for companies to take web application security seriously, security should be opt in not opt out and there should be some legal and financial liabilities on the companies to force them to prioritize security. In the discussion session, an audience member asked why the companies that hire the services of vulnerability scanners like WhiteHat security don't do anything when the bugs are found. Jeremiah said that for most of the companies, revenue generation is more important than security especially when they don't know whether the actual attacks are happening or not. Another reason could be because the developers don't know how to fix the bug. WhiteHat also implements training programs for the employees and provides consulting to the company.

Session 1: Privacy (Session Chair: Ben Adida)

"Enforcing User Privacy in Web Applications using Erlang" by Ioannis Papagiannis, Matteo Migliavacca, David Eyers, Briand Shand, Jean Bacon, and Peter Pietzuch and presented by Ioannis presented. Ioannis motivated their work by presenting an example use case of a micro blogging application with several publisher and subscribers and a centralized dispatcher. The dispatcher's task is to process the different messages and control the flow of messages so that only authorized publishers and subscribers get the messages. The motivating example shows how information flow control tagging can be used to ensure that the dispatcher enforces the privacy policy. Ioannis also show how Erlang's features like isolation, asynchronous message passing and scalability can be used to practically enforcing the privacy requirements. During the discussion, an audience member asked if their work was to show that a language with strong guarantees can be used to enforce security properties? Sruthi Bandhakavi asked how the authors dealt with label creep. If a subscriber is a publisher of information and then wants to send the information sent to the subscriber back to publisher indirectly, then the label could quickly become too complex. Ioannis replied that this is a limitation of their work, the policies and tagging of information should be designed carefully to prevent label creep. Leo Meyerovich asked how their system interacts with the database and whether the tags are stored along with the data. Ioannis replied that it is indeed the case that the types need to be stored otherwise we would lose the flow information. Kapil Singh asked where the policy checks are done. Ioannis replied that the checks are done at the server side where the policy is enforced.

"RT @IWantPrivacy: Widespread Violation of Privacy Settings in the Twitter Social Network" by Brendan Meeder, Jennifer Tam, and Patrick Gage Kelley and presented by Brendan. His talk was about how privacy is violated by re-tweeting in Twitter. In the discussion session, an audience member asked how retreating is different from sending a video to a friend and the friend posting it online. David Evans asked if there is any check to verify that the people who are tagged as originators of the tweet are actually the originators? Brendan replied that the original author's privacy is violated during re-tweets since the original user's policy about tweeting is not checked during re-tweets. The next version of Twitter introduces a mechanism where official retweets are displayed differently. Charlie Reis asked if the people who are re-tweeting know that the content they are re-tweeting is protected. Brendan said that they do. Travis made an observation that in case of non-official re-tweets, people could believe that there is a legitimate tweet from the genuine person. Brendan said that it is not possible to do it. Andy Steingruebl said that there is no way of knowing what the user expects, Twitter accounts can be made private to prevent follower spam, but the people might want to publicize their tweets (which account protection does not allow). Brendan said that Twitter needs to understand what the user expectations are and then use this information to provide appropriate account protection. Leo Meyerovich asked if the authors have any intuition about privacy violations in this case, for example the number of people whose how was broken into due to the privacy violation through Twitter. Brendan said that he does not have the information to answer the question and it is also tough to find this kind of information. Leo also asked why Twitter cannot fix this technically since it has information about public and private accounts. Brendan replied that Twitter does not want to solve it in a centralized manner, it wants to be hands off.

"Feasibility and Real-World Implications of Web Browser History Detection" by Artur Janc and Lukasz Olejnik and presented by Artur. Artur talked about attacks on user privacy using css :visited pseudo class, which could be used to inspect users' web browsing history. He gave some background of this issue. He also presented an analysis of what can be detected and the performance of this detection mechanism. He gave insights on how a history detection system could be built. He also presented current work and countermeasures. In the discussion session, Sergio Maffeis asked whether in their attack technique they need to send a lot of data to the client(a list of all the interesting websites that could be visited) and consequently won't this require a lot of upload capacity to the client. Artur replied that the attack may not consist of 20K webpages, the attacker just needs to send some important websites. In their paper they show that network performance and the data transfer performance are comparable. An audience member asked why the page load performance shown in the presentation decreases over time. Artur replied that the browsers become slow when loading a large page to build a DOM tree. An audience member asked why in the graph with number of users visiting the top 5K websites, a quarter of the users visited none of the top websites. Artur said that they might visit the websites in the private browsing mode or there could have been a script error. Sergio asked if a malicious user can use this technique for a mass attack if he finds a website with a large user page. Is there any good example of a mass attack? Artur replied that one example could be if the attacker finds out that certain website it very popular, he could add features to the website that target the users. An audience member asked if there were any client side mitigation techniques for this attack to make the client understand that the user is being attacked. Artur said that he was not aware of this. It could be possible to have a client side mitigation technique but he thinks that it is easier and better to solve the problem. An audience member asked how the authors advertised their study to the different users. Artur said that they advertised the study on reddit and then only interested people came to the website where the attack was hosted.

The morning session was followed by lunch and invited talk by Kurt Opsahl from the Electronic Frontier Foundation. He talked about "Social Networking Privacy Issues". Kurt started his talk by outlining why privacy is important and showed how Facebook privacy policies evolved over time. However the security measures proposed by Facebook are still not enough. In the end of the talk he proposed a Bill of Privacy Rights for social network users where he proposed having the right to informed decision making, right to control and right to leave. An interesting discussion ensued after the talk. An audience member commented that even if there is a good user interface, the sharing contexts are sufficiently subjective that people don't agree on them. Kurt agreed and said that defining an appropriate sharing context is extremely difficult. However care can be taken to define contexts so that people can see that there are limitations. Technical solutions can be provided to prevent individual users from making mistakes. If users limit information to friends, there is no way to guarantee that the friends don't share this information further in a different context. This is a hard problem. Kapil Singh commented that once the users are on the social networking sites, opting out from the site doesn't give any advantage. For example in Facebook opting out of using an application does not make any difference since the application already had access to all the user's data. Kurt said that Facebook has an old rule that the data has to be deleted by third party applications within 24 hours of getting it, however that rule has been repealed now. He agreed that this is a difficult problem. David Evans commented that in this context privacy policies are entirely meaningless. Most social networking sites give detailed messages about changed options and give the user an option to leave the site. Although this is misleading in some sense because the user data already known to the websites, there are some legal regulations that prevent the websites from breaking their promises about private information. Charlie Reis asked if there were any promising technical direction for solving the problem of deleting data. Kurt said that the problem of deleting data is challenging and deactivating the account does not mean that the information is deleted. He does not know any current technical solutions for the problem. An audience member commented that it is a losing battle to make something online to disappear completely and any guarantee is impossible. Kurt replied that even though guarantee is not possible, there could be a positive effect on privacy even with a little effect.

Session 2: Mobile Web (Session Chair: Charlie Reis)

"What You See is What They Get: Protecting users from unwanted use of microphones, cameras, and other sensors" by Jon Howell and Stuart Schechter. Jon presented. The talk was about privacy considerations in in-built cameras and microphones. Jon discussed the effect of permission to access sensor data given at one point of time on some other point of time where the data could be transmitted to unauthorized users. Jon described their proposed solution where a widget called sensor-access widget gives feedback about what the sensors are doing at each point of time with respect to each application that is running. For example, if the webcam is active and an application is accessing it, the live video is displayed on the screen along with the application. They also built a policy called SWAID which is used to control the feedback mechanism using the sensor-access widget mechanism. Sruthi asked if this won't be too restrictive if the image has to be shown for every application that is being displayed on the screen or a web page, for example in case of mashups. Jon said that mashups are a hairy case. An audience member asked how big the image should be and also if there is more than one kind of sensor information (camera, microphone, geo-location, accelerometer, etc.) how that information is displayed. Jon replied that image size and also handling different kinds of sensors simultaneously needs to be thought about. Kapil Singh asked if the authors were trusting the application to enforce the SWAID policy. Jon replied that the policy is enforced by a TCB like a browser, Operating System, etc. Leo Meyerovich asked how the authors handle the mashup or delegation case. Jon replied that this is a big can of worms where one doesn't know which application owns which chunk of the display. This a difficult problem to make progress on. David Evans commented that the most obvious alternative to protect the video streams is to have a lens cover and he asked how the proposed approach compare to that approach. Jon said that it is an excellent approach but does not work when there are multiple applications of which some are allowed access to the video streams. Dominique Devriese asked whether the authors have implemented the proposed techniques. Jon replied that they implemented it and studied the appropriateness of the approach. An audience member asked Jon to contrast the SWAID policy with the policy where the user is asked to opt in for each application that requires the information. Jon said that the opt in policy assumes that the user always wants to allow or disallow the information flow. This is not the case in SWAID.

Position paper: "Empowering Browser Security for Mobile Devices Using Smart CDNs" by Ben Livshits and David Molnar. David presented. The talk was to generate a discussion about improving security in mobile browsers. This area is different from implementing changes to the desktop web browsers because it is difficult to push changes to the mobile clients -- not everybody upgrades at the same time. One solution is to add the security primitives to CDNs in the middle tier so that the computation on the mobile devices is secure. There are two main research directions for this to work. First, we need to think about what kind of security services we can provide. Real time phish tracking, URL reputation, XSS filters, etc. are some examples of security services that could be provided by smart CDNs. Second, what if the middle-tier is not trustworthy? There are multiple vendors and operators and multiple web applications. How do these work together and what are the privacy considerations? In the discussion session, Leo Meyerovich commented about the role of the corporate world. For example, in Microsoft it is not allowed to use arbitrary internet connections. There could be differences between managed IT networks versus home networks. Arvind Narayanan and Charlie Reis asked in what way the deployment in the middle-tier is different from the deployment at the client. David replied saying that this is the way the the industry is going. 40% of content goes through CDN. Given this, it is imperative to ask what kind of services we can provide. If we get an opportunity to change the website to integrate better, it will be useful. An audience member asked how people could trust the middle-tier to render pages because the middle-tier is like an ISP which people are less likely to trust. David cited the example of Opera Mini, which people are voluntarily using because of power and performance benefits. The people however don't understand the implications of using opera mini. David Evans asked if there was a situation where people trust the middle-tier but not the client. David gave the example of coffee shops which cache most urls; maybe the people trust AT&T but not the local Starbucks. An audience member asked for the exact definition of trustworthiness. David replied that that there are several perspectives of trustworthiness depending on who one asks; the consumer's perspective could be different from the content provider's. The challenge is to find the different perspectives about problems. Susan Landau commented that this is not just a technical issue, legal issues could also be involved. The issues vary depending on the region also. David agreed and said that this could also be a social issue. Middle-tiers can span countries, so it is important to consider regional differences too. An audience member commented that some distinction needs to be kept between forcing the clients to use this versus the ISP offering this as a service. The ISP should not be able to force the user to use a middle-tier component. An audience member asked if the authors were thinking of examples transcoding videos to the resolution of the code. David agreed and said that one such example is iPad. We security people should think about how to do this the right way. Android uses third party libraries to render graphics etc. but these seem to be out of date. The CDN could be used to patch libraries for errors like buffer overflows.

Session 3: Measuring Security (Session Chair: Adam Barth)

"Hunting Cross-Site Scripting Attacks in the Network" by Elias Athanasopoulos, Antonis Krithinakis, and Evangelos P. Markatos. Elias presented. In this talk Elias presented xHunter, a tool to detect suspicious URLs. The main idea of the tool is to identify all the URLs that contain JavaScript. xHunter scans a URL for fragments that produce a valid JavaScript syntax tree and assigns weights to any URL that contains a fragment that produces a valid JavaScript syntax tree with a high depth. The weights are assigned using certain heuristics like reversibility, giving more weight to certain nodes in the parse tree etc. In the discussion David Molnar asked if the performance of xHunter will improve with hardware support. Elias said he could not comment on that at the moment. Dominique Devriese asked how the tool compares to intrusion detection. Elias replied that their tool is like Snort but is not based on static signatures. An audience member asked how essential the reversibility heuristic is. Elias replied that the reversibility heuristic could be used to subvert the tool. An audience member asked how the tool compares to browser based XSS filters. Elias replied that their big aim is to have something that can possibly run on the network. If one creates a browser-based filter, it would be good to compare xHunter's accuracy and speed with it. Currently they are not looking at host based systems. Adam Barth asked what prevents JavaScript from being able to be parsed in reverse. Elias replied that this heuristic is true for any language; it is hard to parse a language in reverse order because the syntax is not well defined. Phu Phung asked how they deal with document.write, where character by character is combined. Elias replied that document.write itself gets a high score. An audience member asked what the highest score is. Elias replied that the score is about 6. This setup was in order to come up with less false positives and false negatives. An audience member commented that scanning the URLs does not tell if XSS exploit works or not and it is not even a measure of whether the attack can really happen or not. Elias replied that maybe there is more value in attacking but trying to exploit a website is also bad. Position paper: "Critical Vulnerability in Browser Security Metrics" by Mustafa Acer and Collin Jackson. Mustafa presented. The talk was about which metrics to use to evaluate browser security. A widely used metric is distribution of the number of known vulnerabilities. Mustafa contends that this metric is meaningless and actively harmful because it ignores patch deployment, discourages disclosure and ignores plug-in vulnerabilities. The authors propose a new vulnerability metric: the percentage of users who have at least one un-patched critical or high severity vulnerability on an average day. The authors collect live statistics using this metric at the website browserstats.appspot.com. Jon Howell pointed out that there is a small disparity in the graphs where the old metrics scale to 100%, while the new metrics generated by the authors don't. Mustafa replied that this is a disadvantage of using their metric because evaluation is done on a per browser basis. However, if the results are normalized, a similar trend will be seen. Jon Howell also commented that the flash player vulnerabilities are not orthogonal to the browser version because whether flash is vulnerable or not depends on what browser is underneath it. Brandon Sterne replied that the metrics considered versions of the flash player that the users of that particular browser have. Adrian Mettler commented that since this metric depends on how quickly users update their browser and also on the users' technical expertise, for example website used to collect these metrics might have more users with more technical expertise. Is it fair to blame firefox for users who don't update their browsers. Mustafa replied that this metric brings out discussion about which updating mechanisms are good and which are bad. If the browser is older than Chrome, it might have more flash players that are vulnerable. The mechanisms that browsers use to push an update also contributes. In firefox, the mechanism is not automated, while chrome updates silently. Artur Janc asked how the authors collect and adopt vulnerability and usage data. Mustafa replied that usage data is collected by showing a JavaScript advertisement on web pages. Vulnerability data is collected by maintaining a manual databased of vulnerabilities of each browser and plugin. The authors have an automatic solution for Firefox but this won't work for other browsers. There is no standard API for collecting all the vulnerabilities a browser has. Vendors don't provide this information and this might be a valuable tool. Artur Janc asked about the zero day vulnerabilities known to the vendor. Mustafa said that the metric itself has temporal implications; there is an uphill spike in the risk score when the browser is updated. If the spike reduces in a short period of time, then this is taken as a good thing about browser security. David Evans asked if the metric is skewed positively towards new browsers like Chrome versus old browsers like Firefox. Mustafa replied that the there are not many browsers with old versions of Firefox like Firefox 2.0 that might change the levels of the risk score. The point is that most of the browsers are latest versions and old data points don't affect their results. However, data might be biased as it works only at a certain time of the day. They do need more samples. Devdatta Akhawe asked if chrome keeps its security bugs secret and discloses them only after a patch and Firefox always keeps bug profile public and takes less time to patch, won't the authors' approach give more weight to Chrome vs Firefox. Mustafa replied that the old metrics already have the same problem. This metric is an improvement over the old metric but does not solve everything.

Session 4: Usage of Existing Browser APIs (Session Chair: Helen Wang)

"Busting Framebusting: a Study of Clickjacking Vulnerabilities at Popular Sites" by Gustav Rydstedt, Elie Burzstein, Dan Boneh, and Collin Jackson. Gustav presented. This paper consists of an extensive survey of the framebusting code present in 500 popular websites obtained from alexa. In this talk Gustav introduced the term frame busting and said that it is used by websites to prevent clickjacking attacks. The authors found that almost all the framebusting code in the wild is broken and gave several examples of such code and how it could be broken. Some browsers have recently introduced options like X-Frames-Options(IE8) and Content security policy(Firefox 3.7), which could be used to solve this problem. Gustav ended the talk by saying that mobile versions of the websites don't do any framebusting and therefore this makes them highly vulnerable to attacks. Devdatta Akhawe asked if Twitter wasn't already doing proper framebusting. Gustav replied that Twitter does regular framebusting but also has three to four backup mechanisms. However, reflective XSS filter will still kill Twitter. Colin Jackson said that the authors reported these vulnerabilities to Twitter and they were fixed as a result of the authors' suggestions. An audience member asked how the websites would behave if the referrer header doesn't exist and what are the failure behaviors. Mustafa replied that if the websites fail the check, the websites can try to framebust the main page.

Talk: "The Emperor's New APIs: On the (In)Secure Usage of New Client Side Primitives" by Steve Hanna, Richard Shin, Devdatta Akhawe, Prateek Saxena, Arman Boehm, and Dawn Song. Steve presented. The web landscape is changing. Users are demanding more functionality from the web applications and expect the web applications to perform similar to desktop applications. To this end, certain new primitives like postMessage, localStorage have been introduced. In this talk Steve discussed the how secure these two client-primitives are and provided examples of several attacks on these primitives. One fix for postMessage was to provide an origin whitelist in the content security policy. Steve also proposed enhancements to the design of new primates by using Economy of Liabilities as the guiding principle. Steve also suggested enhancements to postMessage and localStorage. Steve said that the browser vendors currently switched off the usage of postMessage primitive till the vulnerabilities could be fixed. An audience member asked since the fragment identifier approach is more vulnerable than postMessage, what is the advantage of switching off postMessage? Steve replied that even though fragment identifier was vulnerable, the application developers might be doing other checks on it, which is not the case for postMessage. Dominique Devriese commented that another guiding principle should be that the applications should be secure by default. Steve agreed and said that this is part of their design too. Dominique also asked what would happen if the whitelist had a *. Steve said that he would recommend broadcast over multicast and therefore * is fine. Dominique asked how the authors reverse engineer code. Steve replied that they use a tool called Kudzu to run the applications and collect the path constraints which can be used to reverse engineer code.

Position paper: "Why Aren't HTTP-only Cookies More Widely Deployed?" by Yuchen Zhou and David Evans. Yuchen presented. In this talk Yuchen talked about why http-only primitive in the cookies is not widely deployed. "http-only" primitive prevents cookies being read via document.cookie. In their survey of the top 50 sites on alexa.com, only 50% of the sites use http-only. There are also two known attacks to circumvent http-only. The authors hypothesize that http-only cookie provides modest benefits but have some deployment costs and therefore it has not be widely deployed. However, it is better to use http-only than not. Adrian Mettler asked whether using http-only would provide a false sense of security and it some people prefer to focus on other diverse solutions. Yuchen agreed that http-only does not offer complete protection. Jeremiah Grossman commented that http-only prevents long term session theft. The application developers don't use it because they don't know about it. Kapil Singh commented that 40% of the websites chain their authentication cookies to be secure. So they don't feel the need to use http-only.

Session 5: Next Generation Browser APIs (Session Chair: Thomas Roessler)

Position paper: "No Web Site Left Behind: Are We Making Web Security Only for the Elite?" by Terri Oda and Anil Somayaji. Terri presented. Most web programmers come from artistic or non-programming backgrounds who want to include a lot of functionality in their site and usually do so by cut and paste techniques. For such people, understanding web security and protecting their sites is an uphill task. In this talk Terri urges security professionals to think about providing security in a visual way to facilitate easy adoption of security by these people. Another proposed solution is to separate site design from security so that relevant people can handle security or security can be outsourced. Thomas Roessler asked whether Terri was referring to the several patches provided for the different vulnerabilities. Terri agreed and said that a lot of stuff is very overwhelming. Devdatta Akhawe commented that not many people care if small websites get XSSed. Terri did not agree to this and said that getting into website defacement is becoming worthwhile. Lots of attackers are interested in sending spam, SEO, evading blacklists, etc. all of which can utilize smaller sites. An audience member commented that the use cases of small websites don't require sophisticated services. Terri said that it may be more appropriate to use a limited set of services in that case. Adrian Mettler asked what the applications needed since they seem to be simple. Terri said that the small website developers want flashy stuff and they copy and paste code from different places. As an example, some e-commerce websites were compromised to relay spam and therefore indirectly affected everyone.

Position paper: "The Need for Coherent Web Security Policy Framework(s)" by Jeff Hodges and Andy Steingruebl.
Andy presented. In this talk Andy emphasized the need for an integrated effort to create standards for implementing web security primitives. Currently, web security is implemented in patches, is different for different browsers and is spread across the code. Andy proposed a unified mechanism where the security should be implemented in the form of an easily configurable declaration and not code. Leo Meyerovich wondered how effective this mechanism will be if the JavaScript is exposed to all the users and policies are implemented by creating a special mechanism. To answer this question, Andy cited the example of Microsoft deployment wizards which step through the generation of code. We need something similar for web application development. Andy said that having security via programming constructs is wrong and that it needs to be configuration driven. An audience member asked how to ensure that any new mechanism is uniformly adopted by every implementation. Andy said that for any new security mechanism, there should be a way to create a configuration file using which the implementation can create extra headers or codes. David Evans commented that a lot of stuff can be implemented without extra configuration. Brandon Sterne asked where the right venue for the unification efforts is. Andy said that standards bodies like IETF have been working on it. He however warned that adopting any new approach will be extremely hard.

Position paper: "Secure Cooperative Sharing of JavaScript, Browser, and Physical Resources" by Leo A. Meyerovich, David Zhu, and Benjamin Livshits and presented by Leo. In this talk Leo underlined the need for introducing new primitives for sharing between different web applications in a mash-up. He argued that there is a need to create a new mashup manifesto where it is understood that sharing requires control, sharing must be natural and sharing must be cheap. Leo also presented a few such primitives that they propose in the paper. Prateek Saxena asked how the proposed primitives compare to the approach of Google Caja. Leo said that security in Caja is based on source rewriting which is fragile. Additionally Caja is based on the notion of object-view and it cannot talk about browser or physical resources. Thomas Roessler asked if the authors want to support the use case of travel between arbitrary websites. Leo agreed. Thomas Roessler commented that Caja outsources JavaScript to JavaScript compilers and uses mechanisms that stop short of JavaScript rewriting. Ideally one would like to take inspiration from such mechanisms. Leo said that they looked at static analysis of JavaScript code. However natural JavaScript precludes object sharing. We need to find minimal changes in JavaScript to be comfortable with gadget sharing without rewriting.