The 2001 IEEE Symposium on Security and Privacy
Oakland, CA, USA, May 13-16, 2001
Review by Mary Ellen Zurko
July 10, 2001
The 2001 IEEE Symposium on Security and Privacy (www.ieee-security.org/TC/sp2001.html) was held May 13-16, 2001 at The Claremont Resort in Oakland, California, USA.
Program Co-Chair, Roger Needham, gave some background on the program. There were 113 submissions. They received 3 reviews each, assigned on a bidding basis. PC members said what they would like to review, and Program Co-Chair, Martin Abadi, did the assignments. Roger sees the anonymity of authors as a privilege rather than a duty. Authors can sacrifice it to make the reviewing job easier (particularly in terms of references).
The first session, Tamper-resistance and Cryptography, was chaired by Martin Abadi (Intertrust).
The first paper was "Cryptographic Security for Mobile Code" by Joy
Algesheimer, Christian Cachin, Jan Camenisch, and Gunter Karjoth (IBM Research).
Their work is motivated by the need to protect mobile code and the originator from attacks. Their privacy example is an auction or negotiation between a vendor and a client with a browser. In that case, both parties need to know the outcome of the transaction. Their model includes a mobile agent starts at an originator and goes to multiple hosts via a path that is not preselected. Each participant sends and receives a single message. Computation is with encrypted functions over encrypted data. Existing work on homomorphic public-key encryption schemes and protocols that use them do not address the need to provide output information to the host. There are also trusted hardware solutions, which the authors consider expensive, not very flexible, and requiring trust in all parties involved. This work builds on Yao's encrypted circuit method. In addition to the construction and evaluation of the circuit, that work includes an oblivious transfer guarantee, where the receiver gets information about one of two messages and the sender does not know which message the receiver got. There is still a need for an output for the host. Their proposed solution is to extend this work with a generic Secure Computation Service, which has none of the stated drawbacks of the trusted hardware solution. This partially trusted service provider T does not collude with any hosts Hi or the originator O. It gains no information for itself, and actively provides a service for confidentiality. Similar service include anonymizers, CAs, and trust audit agencies. The realization involves one round of interaction with T. O constructs an encrypted circuit for C computing the functions over the inputs needed by host H and O and the separately encrypted input tokens that H must choose between. H sends its choice to T. T decrypts the appropriate set of tokens and returns the cleartext to H. H uses the cleartext to evaluate the encrypted circuit. The circuits/computations must have unique ids, so that T never decrypts more than one set of tokens for each id. Adding robustness is possible. Practically, this only works for small parts of a program, such as a composition function or negotiation/auction strategy. As a concrete example, if the circuit is realized with AES with 128-bit keys, computing the maximum of two n-bit numbers will need less that n kilobytes of storage and about 100n AES operations. Their conclusions are that mobile code security is not possible without assuming trust in a 3rd party, it is possible using a generic on line service T, that T learns nothing about the computation, and that it could be practical for small functions.
One question was how to get some assurance that T is trustworthy. Christian suggested you may perform some computations or check some subset or sample. The questioner instead wanted assurance that T is not leaking information. Christian does not know if that can be solved. Another question brought out that authenticating T is built into the model, with the assumption that everyone has to have a public key (and a connected trust infrastructure). Another attendee asked how their work exploits the fact that it is mobile code (rather than a multi party computation). The answer was that the model for communication is different. Everyone has their own security considerations and they are linked by a single outgoing and incoming message each.
The next paper was "Networked Cryptographic Devices Resilient to
Capture" by Philip MackKenzie, Michael Reiter (Bell Labs, Lucent). Mike
While cryptographic algorithms and protocols are standardized and strong, the end points which may hold private keys are still subject to viruses, theft and reverse engineering. They are in some sense are laughable and the weak link. In a 2001 CSI/FBI survey, the top two causes of quantifiable losses were due to viruses (35%) and laptop theft (27%). Private data is usually encrypted under a password but passwords are notoriously easy to break. For example, a study by Klein found 25% of the targeted passwords with a dictionary search. The authors are interested in a software only approach to this problem. The device cannot sign without testing the password at the server, requiring online dictionary attacks. The server is untrusted. an attacker who captures the server itself and a user's password cannot impersonate the device. Like the previous paper, this is a service provider model. For an attacker that captures the device, the goals is that the best they can do send password guesses to the server. If a strong attacker captures both, it can forge a signature only if it succeeds in an offline dictionary attack against the password. Device initialization does not involve the server. The client generates a token which is encrypted with the server's public key, which when decrypted allows the client to extract its private key if it has the right password. The token also includes some random numbers that are saved on the device. The private key and password necessary to create this initial token are then discarded. The device generates an authenticator based on the input password, encrypts it, and sends it to the server. If the password check passes on the server side, it sends back information that allows the device to recover the private key. There is some protection against DOS in the protocol (not presented). The system can be extended to support key disabling, even from someone who steals the device and knows the password. This provides defense against the disgruntled employee scenario. It is not possible with the generic protocol, since the device extracts the private key and it can be stored for future use. This is complementary to public key revocation, which requires the checking of a CRL. Using function sharing to share the function of encryption and signing between the device and server, the key can be mathematically disabled. The server's share of the computation is returned with the device's signing key. Mike showed examples with RSA signatures and referenced the ElGamal decryption example in the paper. There is an error in the proceedings; the authors have issued a tech report through dimacs.rutgers.edu/ with an update. The work includes a non-interactive zero knowledge proof that the server computed its share correctly. Otherwise, the server could change the encryption of the message in an undetected way. The work does not roll over to DSA directly. Their two party DSA signature generating protocol will appear in CRYPTO. It is more computationally intensive that the work described here. There are simulation proofs showing the protocol is provably secure. The summary stated that the work achieves quantifiable (and very high) security using standard passwords or even PINs. It can achieve better security than was previously possible, but it needs network connectivity to perform cryptographic operations.
Questions challenged the efficacy of this approach against viruses, since they can look at a great deal of residual information, including what got typed in and the private key that is returned. The class of viruses this works against is the sort that sucks the on-disk information off the machine and sends it somewhere. Residual information protection is outside the threat model. Someone asked if the server is completely untrusted, why bother to have public keys, since the server must be trusted for revocation. The sense that the server is untrusted is that if you take the status quo and involve the server you get key disabling. So if the server does not honor the revocation, you don't lose anything you had before use of the system. Another attendee said that since fixed passwords are bad idea in the first place, isn't this overendowing them by building a straw herring and shooting it in the foot (I am not making this up). Mike's response is that fixed passwords are still around, and with this you get dramatically greater security. They've thought about augmenting the system with one time passwords. Some cases can be done, and some are awkward to do. An attendee pointed out the tradeoff is against a new DOS; you can generate signatures that no one can verify.
The next paper was
"Protection of Keys Against Modification Attack" by Wai-wa Fung, Mordecai Golin,
Jim Gray (Hong Kong UST). Jim presented (he has been at RSA for about 3 years).
He pointed out that it's really Wai-wa's work. The key is recorded in the EEPROM of a smartcard. The EEPROM cannot (easily) be read but it can be easily modified. An attacker can also operate the smartcard. This work is inspired by the Anderson and Kuhn attacks. Microprobing needles can write individual bits in the EEPROM and are cheap. The attacker can try to run the card and see what happens. An attack for DES relies on parity bit checks. Setting a particular bit to 1 succeeds if it was previously 1, and causes the parity check to fail if it was 0. This attack is generalized by running the device once to obtain the correct output, then making modifications and comparing the new output with the correct one. The modification attack exploits the fact that the key is stored bare in the device. Perfect protection is impossible. The goal is that it is hard for clever outsiders, with not much money, but lots of time and intelligence, to get the key. They considered hiding the key in a random location, which transfers the problem to finding the address of the key. The cards must be mass produced, and the location would be stored in the EEPROM. This work separates the physical representation of the key P from the actual cryptographic key K. P is stored in EEPROM. A decoding function maps P to K. A possible protection scheme introduces redundancy, so there is more than one possible P for each K. P is decoded with a majority vote. It can still be broken by a modification attack in O(length P) time. For any protection scheme if the wiring function W and the decoding function D are known and deterministic, then an attacker can break it in O(length of P) probes. Hidden wiring and secret decoding functions and randomized decoding functions were not considered. The first idea was that P is a permutation of K and the wiring inverts the permutation. This can be broken in O(n) probes where n is length of K. They then considered cascading M permutations. When M = 2, the wiring undoes both permutations. All decoding functions must receive matching inputs before a key is output. This is attacked by getting P, then using a brute force method to get the wiring in O(n squared). If a protection scheme uses M different cascaded permutations, the brute force technique requires only O(n to the m) time for the attack (polynomial time complexity). A single card, instead of the whole batch, is needed to be compromised. Cards are normally manufactured in batches. The encoding needs to hide the bit occurrences of 0's and 1's in K. It blinds the key before using it. Blinding conceals the number of 1's and 0's in the key. Temporary keys are generated at programming time. There are the same number of extra keys as permutations. After reversing the permutations, it XORs all the physical keys together, so they get cancelled out. For more redundancy, P can be lengthened. You want to have a roughly even number of 1's and 0's in the blinded keys. An analysis with a range of numbers of 1's and 0's is in the paper.
Questioning brought up that the scenario is that there are small batches for a given wiring (about 2 thousand). Someone asked if they compared the strength with an error correction code to represent P, so that a certain number of bits is needed to change P. They hadn't.
The session called Intrusion and Anomaly Detection, I was chaired by Marc Darcier (IBM Zurich Research Center).
The paper "Data Mining Methods for Detection of New Malicious
Executables" by Matthew Schultz, Eleazar Eskin, Erez Zadok, and Sal Stolfo
(Columbia University) was presented by Eleazar.
Eight to ten new malicious executables are created every day and traditional detection methods are vulnerable. Their idea is to use data mining to detect new malicious executables. Traditional signature based methods can not generalize to new executables. Virus scanners with heuristic components are hand crafted by experts, so they are expensive, and their rules are not generalizable to other platforms. The data mining approach finds patterns in benign and malicious executables and uses these patterns to classify new binaries. The advantages are it is automatic and generalizes to new executables. The model requires less frequent updates. It produces a probabilistic output. The prediction algorithm extracts features from the suspect binary and computes the probability of it being malicious given its features and a model of malicious and benign executables. The modeling algorithm extracts features from binaries labeled malicious or benign and computes statistics over those features. Given a set of features F and a class C, the goal is to compute P(C|F). They use Bayes Rule, since it's a difficult quantity to compute directly. It's easiest to compute the probability of the class itself P(C). They use the naive Bayes independence assumption that each feature occurs independently. This is not realistic but it works in practice. They compute the most likely class and output whichever probability is larger. Each feature contributes to the final outcome. Discriminative features are present in one of the classes and have a large impact on classification. The data mining features include OS specific features. For Win32, this includes DLL and resource information obtained from the executable headers using objdump.exe. The OS independent features they use include the strings extracted directly from the binary. Their experimental data set consisted of 3265 malicious programs and 1001 benign ones. 244 of these were in Portable Executable (PE) format. They did a 5-fold cross validation to simulate new executables. 4/5ths of the data was used to generate the model (training) and 1/5th was used to test the model (test set). This was repeated 5 times. Their baseline was a signature scanner. It extracted all signatures from malicious executables and chose a signature for each malicious executable that did not occur in the benign executables. It labeled each executable malicious if it contained a malicious signature. The approach was based on a standard anti-virus signature approach. The detection rate for the signature based approach was 33.96%, for their Bayesian approach it as 96%. There were 0 false positives with the signature approach vs. 6.01% with The Bayesian. They calculate that tuning the algorithm to a 1% false positive rate would produce a 75% detection rate, while a 98% detection rate would produce a 12% false positive rate. The results are only preliminary; this is a smaller data set than the 50K known existing malicious programs. The data set is all stand alone programs; it does not include infected programs, and it is not realistic data that is found in the wild. They're looking for help finding a larger data set. They also hope to work on more efficient algorithms.
One attendee conjectured that the false alarm rate was strongly influenced by the data. Someone asked what the difference between pattern recognition and data mining is. The response is that it's a philosophical question. The presenter's is on machine learning, but it's called data mining in security. Someone suggested that trying to implicitly extract behaviors is problematic. The rule sets created are human readable, but they don't make sense (they're not something you'd think of). They just perform statistically well. Someone pointed out that there could be problems with a poor selection of features, or a class of programs that don't do anything malicious only because our OSes don't force you to do something malicious. The response was that this method can find slightly different clones inspired by something like the IL*VEYOU virus. A question brought out that they are pruning feature set if the number is close for both classes. Another question brought out that they are looking at thousands and thousands of features; string extraction in particular produces a lot. There was some discussion about labeling broad sub classes of viruses. Someone pointed out that a moderately smart virus could behave normally for a while with an embedded encrypted program, then attack after running a number of times.
paper was "Evaluation of Intrusion Detectors: A Decision Theory Approach" by
John Gaffnew (Lockheed Martin) and Jacob Ulvila (Decision Science Associates,
Inc). Jacob presented.
They present a comprehensive way to think about the evaluation of Intrusion Detection Systems (IDSs): the expected cost metric. It can also be used to compare IDSs with each other, against having no detector at all, and against a detection goal. It can also choose the best operating point for an IDS when the system is tunable, and be used to adjust the operation of an IDS in response to a changing environment. Some methods used or proposed for evaluating IDSs include comparing the curves of Receiver Operation Characteristics (detection probability vs. false alarm rate), setting a particular goal for probability of detection and false alarm rate, cost based modeling, and false alarm rate. Metrics proposed include the area under the ROC curve, total cost from a simulation, the threshold false alarm rate where response becomes "unmanageable", and "distance" from a goal or threshold. Their decision analysis approach is to combine ROC and cost based methods. This extends to consider the hostility of the environment. This produces an expected cost metric. It is described in the simplest case needed to make their points in the paper, but it can be extended to more complicated situations. They use a decision tree for calculating the expected cost of an operating point on an IDS's ROC curve. Costs include failure to respond when there is an intrusion and cost of responding when there is no intrusion. The detector provides the probability of an alarm given an intrusion. The system needs the probability of no intrusion given an alarm, the probability of an intrusion given an alarm and the probability of an alarm. They are calculated under Bayes' theorem. Techniques from Bayesian statistics, mathematical psychology, decision analysis need to be used to compensate for base rate fallacy; individuals fail to account for the base rate when estimating a probability. They estimated based on the data available and guidelines. The costs also need to be estimated, including damage costs, and monetary and other outage costs. Then determine the optimal operating point that minimizes expected cost. You can find the optimal points for each detector; then get the difference in expected costs. They produced two estimated ROC curves to fit the DARPA data. With costs and probabilities applied, either could be preferred to the original study's goal point. Their conclusions were that meeting a goal on the probabilities of detection and false alarms is inadequate, the area under the ROC curve is not a good metric, connecting discrete ROC points with straight lines is pointless, and the lowest expected cost will always be at one of the points. Expected cost is a superior method for evaluating IDSs and should be used in actual evaluations. The method should be included in IDS research and development projects.
Someone asked if the model dealt with the cost of responding to alarms. Expected cost vs. no detector could be looked at. They don't currently have a lot of detail on that. An attendee pointed out that they make evaluation even more complex. Will we ever get something usable? It is subject to our ability to make estimates. Maybe you can just look at the false alarms because if that's too high, no one will respond. The severity of some attacks could put you out of business. Questioning brought out that and attacker might generate a lot of false alarms to shut down the system or doscredot an IDS. Another attendee pointed out that about 15 years ago in a large circulation IT magazine, someone indicated you could target the cryptographic checksums on messages. Systematically cause a link to randomly fail for a month or two. The people responding would let bogus transfers go through in order to get their normal job done.
The session on Information Flow was chaired by Virgil Gligor (U Maryland).
Confidentiality and Algorithms" by Johan Agat and David Sands (Chalmers
University of Technology) was presented by David.
Mobile code provides a new motivating example for continuing research into whether code keeps its secrets. Programs may help you find a doctor, get medical information, and store your medical records. The current focus is on techniques such as static analysis and proof carrying code. NonInterference (NI) specifies that the observable behavior an attacker sees is not influenced by high secret data. Information can be relayed via timing and delays. Can useful secure programs be written? The "folk law" is that NI is too strong in practice and that "timing channels are impossible". We need to understand more to figure out how to best loosen the strict information flow conditions to achieve practical programs. Timing issues are real and shouldn't be ignored. Are there efficient algorithms which satisfy the conditions of secure information flow? This paper presents a study of algorithms for sorting and searching collections of secret data. The timing behavior includes caching. Recently referenced items reside in the cache, and fetching from cache much cheaper than from main memory. It is easy to construct a covert timing channel from cache behavior. For example, a variable may or may not be cached, depending on the branch previously taken. Structure size cannot be kept secret; indexing out of range can leak the size and running time always depends on structure size. One approach is to make all operations take the same time as the worst case operation. To provide deterministic search structures, O(log n) structures (AVL trees, 2-3 trees) can be modified according to the worst case principle. Fixed hashing won't work. It has a non-constant factor worst case since collisions take asymptotically more time. If keys are stored via pointers there are (additional) cache-leak problems. If the algorithm inserts items in a different order depending on a branch and if they're pointers into main memory, the attacker makes them collide in the cache, so that the system can only cache one or the other. This forces an information leak. Standard heapsort is easily adapted using the worst case principle; the difference in time is quite small. Adapting Batcher's odd-even merging we get O(n log squared n) time, O(n) space and a completely static comparison order independent of the data in the structure. Randomized algorithms can provide probabilistic noninterference. Randomized quicksort chooses the pivot randomly so that all splits are equally likely. The expected complexity is O(n log n) with any input and the distribution of running times is not dependent on the elements. Future work includes other important classes of algorithms, formal verification/program logics, and assuming pre-certified library components.
A questioner was concerned that this requires the cooperation of application programs which may be hard to get without control of the source code. Did they consider the fuzzy time work? This work's perspective is compositional. They want to verify the components then allow program to use these freely. The attacker can arbitrarily magnify the timing differences. A little bit of noise doesn't solve the problem. Another attendee pointed out that the NI method by randomized hashing is theoretically insecure. Is this a limitation of the model? Yes, they assume that the hacker has limited computation power.
paper was "Preserving Information Flow Properties under Refinement" by Heiko
Mantel (German Research Centre for Artificial Intelligence, DFKI).
Stepwise development is crucial for complex systems. This involves composition/decomposition and refinement. You start with an abstract specification of what the system does at some level. It satisfies certain properties that are specified declaratively, and they are proven. From the abstract you refine to a more concrete specification. Look again at the property; does the concrete specification satisfy it? System specifications use an event based model. Traces are sequences of events. An event is an atomic action, like the transmission of a message. Subset refinement removes non-determinism. Information flow properties are properties of single traces. Safety and liveness properties are properties of sets of traces. An alternative to verifying the closure property on a set of traces is to verify the unwinding conditions. The unwinding theorem states that if there is an unwinding relation such that the unwinding conditions are satisfied then the information flow properly holds. The example in the talk involved
three events, two of which are visible. Each can occur at most once, in any order. Previous suggestions include restricting the information flow properties. This is not fully compatible with non-determinism. Systems may be rejected as insecure because they are non-deterministic , even if they are secure. No satisfactory property has been found so far. Another approach is to repair after refinement. A problem with that is that information flow properties need to be reinvestigated after every development step. The specification must be repaired in terms of complete traces, but the repair might not terminate at a single level. This work proposes to restrict refinement such that it preserves the unwinding conditions. What is new is that this system exploits the proof of the information flow properties at the abstract level. It provides two refinement operators that take the abstract specification and event pairs to be disabled. For pairs involving a high level event, that subset can be disabled. For the low subs
et there are two possibilities: disable additional pairs not in the set or disable at most what is in the set. The refinement operators return a concrete specification. Refinement operations for the perfect security property, forward correctability, and so on are not restricted to the two level flow policy. The calculation of refinement terminates. Future work includes other notions of refinement such as data refinement and possibly action refinement, how to ensure the minmality of the unwinding relation and how to preserve it (important for optimality), case studies such as mobile computing, the bigger picture for information flow properties being refined from abstract specifications to concrete code, and work on programming languages.
Questions focused on whether or not bi-simulation (?) was better. Since I [Mez] don't know what it is, I was unable to capture them in a useful form. Answers indicated that the author had not explored that avenue, and that his approach does not provide an easy to understand abstract property.
The session on Access control and trust management was chaired by Paul Karger (IBM).
"Understanding Trust Management Systems" was
presented by its author, Stephen Weeks (InterTrust Technologies).
The paper is based on the research he did over the last few years to help with a decision on which trust management (TM) to use. The mathematical model lets him pigeon hole and talk about them. In his definition of trust management systems, a trust management engine (TME) mediates access. The authorization inputs are (principal, object, access) tuples. The access matrix is not a centralized object. It is carried in lots of different certificates issued to principals in the system. A different access matrix is conveyed by each certificate (cert). The TME combines certs and decides whether to allow the access to the object by the principal specified. Certificates may delegate from one principal's access matrix to another. The TME takes a principal (who authorizes), an authorization (what want to do), and the set of certs. It checks what that principal authorizes. The model needs to cover authorizations and delegation. The access matrices form a lattice. You can take any two and join them in a natural order. The TME combines authorizations with a join. Examples of authorization lattice include PKIX, Mass and Herzberg's 2000 role based paper, the SPKI name component, and Keynote for check writing. The model for certs and delegation starts with principal P authorizing action a. Delegation is that P authorizes a if p' authorizes a (or a'). You can delegate to multiple principles (and). Then a cert is modeled as an issuing principal and function, such that if principal p issues function f and M is an element of what is authorized elsewhere then p authorizes f(M). The function is monotonic. If you present a set of certs that prove you are authorized and it says no, you shouldn't be able to present a smaller set and get yes. All certs are monotone (they cannot be revoked). You can model what goes on in TME as finding a fixpoint. In an initial map, all principals authorize nothing. Compute a new map by looking at all the certs for a particular principal and apply them to the previous map. Join them in the lattice of authorizations and shows what's been granted. You get to a fixpoint when there is no change from one point to another. The paper presents the semantics of SPKI and Keynote. You can improve an existing system by extending the cert language to express more monotone functions. You can compare existing systems within the same framework and study concepts applicable to many TM systems. The model can also be used to design new systems.
An attendee asked if this rules out certs that restrict access. Yes, they must be monotone. An attendee noted that it cannot model, for example, the Brewer Nash Chines Wall model which is used in investment banking, where you can lose access to related data. Someone asked if he could relate the model to what's happening at Intertrust. Stephen said No. Several questions were about the lack of revocation in the model. He doesn't know of any good expressiveness models between this and Turing complete, but that would be nice as a model. Someone asked about a comparison of the Lampson, Abadi, et al work on an authorization logic in the '90s. He found it simpler to think a lattice instead of logic. There are just functions and the trust management computation is simpler; just a fixpoint. Someone asked why not a model theory for those logics? This is a very operational model. It's just saying about the certs, here's what you need to compute the TM decision. There's not much insight into what entities in the system believe.
The next paper was "SD3: a trust management system with
certified evaluation" by Trevor Jim (AT&T Labs Research).
This paper follows well from the one before it, as it addressed the question "What if TME client does not have the required credentials needed for the authorization decision?". The TME could go out and get stuff, providing distributed trust management. The application formulates a question. If the TME can answer the question based on the certs input, it will. Otherwise, it sends a message to another TME that might provide needed credentials. The original TME as well as the called TME can contact other TMEs. We can use this to build interesting distributed systems such as PKIs. This seems to be more complicated than SPKI, and you might worry about it becoming too complicated to be trustworthy. This works shows how to write an interesting PKI with the system (a secure version of DNS) and a technique for the complexity problem (certified evaluation). It makes the trusted computing base very small (comparable to SPKI). DNS is an insecure protocol that maps host names to addresses. For scalability, it also allows delegation of a portion of the mapping/namespace. To find the source for a particular name, the local namespace authority knows the root name server and can ask it for the address. The root knows what subdomains are delegated where and tells local name server the next candidate down the tree to ask. This continues on down the tree to the target name. Assuming there are address and key resource records for the root and other name servers, and delegation records, next you need a resolver. This resolver defines a relation such that it holds if the hostname n has address a. The system fills in the address when queried. You can get multiple facts for multiple addresses. There are a number of cases the resolver needs to address. First case, if there's an address record locally for the name (fairly straightforward). Case 2, talk securely down the name server delegation tree (you have looked up the key and address at the current node, which is directly above the next step in the delegation tree.). Case 3, every address record is signed by a key. Only pay attention to the ones signed by the right party. Case 4 is the recursive case. Look recursively down the tree. Case 5, the name is nowhere below. Go to the root. The author has a prototype that shows a simulation. If you write a security policy, having it small means you understand it and analyze it, and it's an advantage. The loophole is that this is the application source code, but what about the TME? It is complicated. With certified evaluation, the source code is what really runs. The uncertified evaluation version takes the policy, query, and certs. There is a signature check, an optimizer, a cache, and a core evaluator. It sends out the query to the network, gets certs back, evaluation continues, and an answer comes out. That system is augmented with an answer and proof between the core evaluator and checker on the way to the answer. It produces a proof that the answer follows the policy. It is a very detailed proof. It is long and each step very simple. Checking is easier than computing it. It can't produce a wrong answer, as a wrong answer would have a wrong proof. If you have a right answer and a wrong proof, then there's a bug in evaluator. An example of a proof is in the paper. In conclusion, the system relies on trust management + cert retrieval + certified evaluation, policies are understandable (at least they're quite short) in comparison with BIND, and certified evaluation makes the TCB small.
An attendee asked what the large real version of BIND does that this 5 case implementation doesn't do. The author said that his code is approximately what BIND should be doing. BIND supports revocation, update, and a different message format. If he changes his message format he could plug it in. But it doesn't have the revocation that is necessary for scalability. Someone asked if when you scale up, is there a disproportionate scaling in the size of the proofs and checker; does it scale if you're supporting the warts in BIND or it disproportionately expensive. The author believe it scales.
paper, "Formal Treatment of Certificate Revocation Under Communal Access
Control" by Xuhui Ao, Naftaly Minsky, and Victoria Ungureanu (Rutgers
University), was presented by Naftaly.
The talk was mostly about the communal access control (CAC) aspect. The certificate revocation information is in the paper. The conventional approach to distributed AC tends to be server centric. Each server implements its own policy often implicit in its code. This is appropriate for client-server applications whenever the server is autonomous and in charge, but not with an enterprise wide policy on lots of servers. You need a concept of CAC policy that governs certain interactions. They use dealing with patient records as a case study. Consider a large, geographically distributed hospital, whose patient records are handled by several heterogeneous servers (R-servers). The hospital has a single policy governing the entry, update, and retrieval of all patient records. The server policy PR might include that servers must be certified by a CA called admin, doctors must be specified by both a board and the admin CA, a copy of every record is sent to a designated audit trail server, and that a nurse may be a surrogate for a doctor. When a certificate that authenticated a server expires, the server's ability to receive records should be nullified immediately. However, a doctor's expiration should be given a grace period. It is difficult to ensure all server implement the same policy. Some relevant interactions do not involve a server (such as a doctor appointing a nurse). Thus the policy PR is inherently communal. Law Governed Interaction (LGI) is a message exchange mechanism that enables a community of distributed agents to interact under an explicitly and strictly enforced communal policy. They have a prototype of this. The membership of the community C is open, and can be large. Nothing is assumed about the structure and behavior of members of C, which can be quite heterogeneous. It supports a wide range of policies and a spectrum of security concerns from inadvertent to malicious violations. The deployment is incremental and easy. Law enforcement is entirely decentralized for scalability, avoiding congestion and a single point of failure. Replication cannot provide these same benefits for a stateful policy with frequent changes in state. There is a personal controller for each agent. Each has the same Law L and the same Interpreter I. The controller only contains Control States relevant to agent x. Controllers send messages between each other. If a controller is close to the client, the extra messages are short/cheap. Laws are defined locally at each agent and without any loss of generality. Laws deal explicitly only with the local events at individual agents. The ruling of a law for an event e at agent x is a function of e and the local control state. While the law is local, the effect is global as the community is uniformly controlled by L. An agent has to find and adopt a controller. It then can send and receive messages. Messages contain a hash of the law L so controllers know what law the communicating thing operates under. In that case, they are trusted to comply to same law. Certification of controllers is also possible. Enforcement does not compel anybody, though something may be effectively compelled if it needs services only available under that law.
An attendee pointed out that if the law is not static, L's may be out of sync. Yes, if you cannot stop community for a change, you have to do it in a distributed fashion. They are pursuing it. Another question brought out that their definition of scalability is that the policy itself is basically local. You can increase the number of agents by a factor of n, and have no effect at all on any one agent. To the question, does the system apply to a transactional system with workflow ordering, Naftaly answered yes. A questioner asked about whether another protocol was needed to check the internal consistency of a law L. Another pointed out that in the UK, they drive on the left, while in the US we on the right. Under this system, they are two different communities.
The session Intrusion and Anomaly Detection II was chaired by Marc Dacier (IBM Zurich Research Center).
"Information-Theoretic Measures for Anomaly Detection" by Wenke Lee and Dong
Xiang (North Carolina State University) was presented by Wenke.
The current state of the practice in anomaly detection is that there is a relatively high false alarm rate, ad-hoc approaches in development and evaluation, systems are specific to a particular environment, there are no theoretical foundations, and it's difficult to model what's normal. The authors hypothesize that the regularity of normal data can be used for model construction, as it provides predictability of (future) activity. Their approach is to use information-theoretic measures entropy and conditional entropy to model regularity. This is follow on work to Maxion and Tan (2000). They found that to work well in a deployment environment, the IDS training data had to be similar to the environment. Entropy and conditional entropy can characterize the "impurity irregularity" of the data set. The smaller (the more regular), the better, the more likely to see them in the future. The "irregularity" of sequential dependencies indicates the "uncertainty" of a sequence after seeing its prefix. Again, the smaller (the more regular), the better. Relative (conditional) entropy indicates how different is p from q. This can be used to say how different is the regularity of the test data from that of the training data. Again, the smaller, the better. Intrusion Detection (ID) as a classification problem. There are normal events, intrusions, and anomaly (don't know) events coming through the system. Maybe the anomalies should be set aside and looked at by the human experts. The system categorizes data into the proper category. The approach to categorizing doesn't matter. You select features, test values, and partition based on that. The larger the information gain, the purer the subset, the better the classification performance. The smaller the conditional entropy, the better the performance of the classifier. For accuracy, the more information available for analysis, the better. However, for efficiency, that is not better. There is a trade off consideration. Assign cost to information; the processing time. They did a case study of anomaly detection for Unix processes. Using the Forrest et al approach, they took "short sequences" of system calls as a normal profile, using a sliding window of length k over the system calls. You don't see any new sequences under normal conditions. With a a classification approach, given the first k system calls, you try to predict the k + 1st system call. How to determine the "sequence length" k? Looking at the conditional entropy of the training data, as you increase k, the entropy drops ( so it's easier to predict next call). It stabilizes around 6; that's what Forrest used. For classifiers trained and tested on normal data, the lower the error rate, the better. The smaller the conditional entropy, the lower the error rate. They tested their approach on intrusion traces. When the length is short, the error rate is larger. Say the cost is linear to the sequence length. There is an accuracy/cost a trade off. If you don't have a measurement of accuracy before building the model, accuracy = 1 - error rate. Estimate it as 1 - conditional entropy. There is information about the MIT data set in the paper. The "regularity" of data can guide the model construction process. For sequential data, conditional entropy directly influences the detection performance. With cost also considered, this may produce the "optimal" model. Detection performance on test data can be attained only if the regularity is similar to the training data. Future work includes studying how to measure more complex environments, and extending the principle/approach for misuse detection.
One question was whether this approach could detect attacks where the program is running a normal sequence but the results not what are desired. No, you can't catch that in this model. It doesn't model frequencies of sequences, for example. The questioner was interested in catching spam. Wenke said you could detect that at the mail server by looking at frequency of requests. An attendee said that they would like that the complexity of avoiding this technique is sufficiently high to gate realistic intrusions. They are very far from having an attack space definition to come up with the model.. A questioner said that they need to acknowledge that the data they worked with was synthesized. They have not tried it against random data.
"A Fast Automaton-Based Method for Detecting Anomalous Program
Behaviors" by R Sekar (SUNY), M Bendre (SUNY), Pradeep Bolineni (Asera), and D.
Dhurjati (SUNY) was presented by Sekar.
This work also looks at program behavior, following up on Forrest's work. A key problem is, What is a good way to represent/learn information about system call sequences? There are issues around compactness, accuracy, and performance. Forrest et al compared several methods for learning system call sequences. They found memorizing sub-sequences of length N (N-grams) to be most appropriate. Markov models provided a slight increase in accuracy but incurred much higher overheads. This paper develops a new method that significantly improves over the n-gram method. The false positive rate and training time requirements are better. There are several drawbacks of the N-gram method. The number of N-grams grows exponentially and N must be small in practice (N=6 suggested). The implication is that it is difficult to capture long-term correlations. The system remembers the exact set of N-grams seen during training; there is no generalization. This necessitates long training periods or a high rate of false alarms. Their alternative representation is a Finite State Automaton (FSA). Even an infinite number of sequences of unbounded length can be represented. They naturally capture program structure such as loops, if then else, etc. However, learning FSA from sequences is computationally intractable and no automated method for learning FSA has been developed One difficulty is in learning FSA from strings, since strings do not provide any information about the internal states of an FSA. Also, how do you answer questions such as, when a state repeats, is it a new state or a return to an old state? Their approach involves interception of system calls using ptrace (as before) and examining the process stack to obtain program counter information. Dynamic linking poses a problem. The same function may be loaded at different locations during different runs. They use the program counter value corresponding to the code calling the dynamically loaded library. As a side benefit, ignoring library behavior makes the FSA more compact. At detection time, a mismatch may occur in terms of either the system call or program location. They use a leaky bucket algorithm for aggregation of mismatches. The program counter helps resynch even after seeing behavior not seen in training. They did experimental evaluation on ftp, http, and nfs servers. "Real" data from live servers requires sufficient traffic load, so it was only done for httpd. It is also difficult to control, not repeatable, and it is difficult to get code coverage. They used simulated data for the other two programs. A script generates requests, generating all commands accepted by the server with a probability distribution intended to mirror real data, using randomization. The FSA method converges much faster than n-grams. It can do with roughly an order of magnitude less training period than n-grams. The false positive rate also comes down at roughly the same rate. The two issues are closely related. The differences are smaller with the live data (a factor of 5 instead of 10 in the false positive rate). Exploits that involve deviation in program behavior can be easily detected. An attack that involves system call arguments can be missed. The FSA method provides a compact representation, without limiting system call sequence lengths. In the future, they would like to incorporate system call argument values, use source code analysis to augment runtime learning (which is actually covered in the next talk), and learn frequency of execution information. A questioner asked why the training period looks highly non-linear. It is conceivable that the possible variations have been exhausted. Using scripts could also factor in. Another question called out that the curve is not linear in any sense. The graph is logarithmic scale. It is monotonically decreasing.
Questioning also brought out that an example of a situation where you would get a new N-gram for an automata that already recognizes that state is combinations of branch types within a program, before they actually occur. Someone commented that if the only case occurs during an intrusion, the FSA will miss it. There was discussion about what intrusion cases that might not be recognized. Even without the leaky bucket algorithm, with buffer overflow, there is no single anomaly. If you're looking at the stack, stack corruption could be detected. But it's possible to craft a string that fools the system. Maybe if the stack looks corrupted, you don't wait any more. Related to the size of FSAs, they are ignoring a lot of things. On how this could be used with an interpreter, implementation would be at a different point, such as where security decisions are made rather than system calls.
Detection via Static Analysis" by David Wagner (UC Berkeley) and Drew Dean
(Xerox PARC) was presented by David.
Anomaly detection can detect unforeseen attacks. But one of the hard problems is finding a good model. There are many metrics of good. The problem they looked at was finding a model to reduce the false alarm rate. A quote from Internet Week indicates it's a real issue in practice. A sys admin may turn ID off after a few mis-timed false alarms. With a server doing millions of transactions, even a low false alarm rate will give a lot of false alarms. You can deduce a model from the program source code, using static analysis. There is no training data set. An example of an attach it catches is one that inserts malicious code. It constrains the executions of programs to no actions inconsistent with the source code. As a tiny example, the model might be the set S of system calls the program could ever call. The enforcement would be that if the program's syscall trace contains anything not in S, raise an alarm. It's not recommended, but it gives the flavor. The talk the day before that looked at the set of dll system calls uses the same model. Such a static model is complete; it includes all correct behavior and it gives no false alarms. It is very hard to get complete coverage with run time testing; it needs very large training set. You might see the control flow graph as an intermediate form in a compiler. There is a vertex for each statement and edges for control flow. Another model takes a finite automaton M and asks does M accept the program's syscall trace? The model is built from the control-flow graph of the program. It is similar to the regular expression matching problem. The abstract stack approach provides a model that excludes impossible paths. Using a non-deterministic pushdown automaton M, can the program's syscall trace be parsed by M? The digraph model takes a set of all the possible pairs of system calls S, that can happen consecutively (next to each other). All observed syscall pairs should be in S. They built an implementation and measured the running on real applications. Sendmail was the most complex application they looked at. They graphed performance overhead (seconds per transaction, like mail message received) vs. how precise the model is. You would like the model to be at least as large as the set of all possible behaviors, but hopefully not too much larger. They got widely varying results. The digraph model is the cheapest in performance, but its precision not great. The abstract stack is more precise, but the performance overhead is large. The call graph is somewhere in the middle.
Someone asked about the detection probabilities. You give up something in terms of attack detection. You can't detect any attacks that don't change the behavior of the application. It did not detect race conditions and application logic bugs. It can detect buffer overruns, format string bug attacks, and introduction of malicious code into the process space. Someone indicated this work is similar to work published in RAID in 1999, although that work did not have the same rigorous reasoning. Questioning brought out that it won't detect attacks on self-modifying code, but the model seems reasonable for C network daemons with high privilege.
The Cryptographic Protocols, I was chaired by Dan Simon (Microsoft Research).
"Performance of Public
Key-Enabled Kerberos Authentication in Large Networks" by Alan Harbitter (PEC
Solutions) and Daniel A. Menasce (George Mason University) was presented by
The contribution of this work is that modeling itself. They develop a performance modeling technique that is applicable to security protocols and their unique characteristics, and investigate the quantitative characteristics of PK-enabled Kerberos in large, multi-realm networks. You need a ticket to the remote KDC to work between realms. The Public Key (PK) enhancements at the initial client request for a ticket from the local Key Distribution Center (KDC) is called PKINIT. The local KDC and the remote one used to need to share a secret key. The change to a public key for establishing the trust is called PKCROSS. Adding public key encryption between the client and server eliminates the KDC entirely, and produces the much more efficient protocol PKTAPP. In PKINIT, the PK is in only the first message sequence. In PKCROSS, the client (Alice) requests a key to a remote KDC. There is no shared secret key. Real time authentication between the two KDCs establishes the secret key for Alice, and it is put into an encrypted ticket. PKTAPP skips KDCs and goes right to the server. The purpose of the model is to quantitatively compare the performance characteristics of the PKCROSS and PKTAPP protocol variants. It should allow them to easily change input parameters and conduct what-if analysis. Once they have validated the model it can test other situations. They use a queuing network model of authentication. The model is characterized by the queuing discipline at each service center (first come first serve, for example), the service time distributions, and the branching patterns (connectivity). The brute force solution technique is to itemize each state and solve a big matrix. A more tractable approach is to make simplifying assumptions (about homogeneity of clients waiting in the queue, for example) and iterative approximations. They can consider all the clients who want the same type of encryption as in the same class. They have the topology laid out, model the input parameters and service time in each "class", and the branching probability in each class. When a customer leaves a queuing center, it may change classes with a probability. Each class may have differing service times, branching probabilities and class transition probabilities. The formulation is well studied and understood. They developed a closed queuing network model of Kerberos multi-realm authentication and then validated the model. They used a PKCROSS skeleton implementation that contains all of the message parsing, encryption and communications, but excluded error handling and protocol options. The PKTAPP skeleton implementation had the encryption operations performed by each protocol. In the multi-realm test bed they found that the model accurately reflected the measured results. The predicted response time was within 3% of the test bed measurements. In comparing PKCROSS to PKTAPP with multiple application servers, they found that with PKCROSS, the remote KDS is the bottleneck. PKTAPP is significantly better for a single application server. The performance is sensitive to network and server capacities. The model is generally applicable in protocols that combine public and secret key encryption. They demonstrated that PKCROSS outperforms PKTAPP in situations where there are more than one server per realm. They would like to consider CRL management performance and use the modeling methodology to investigate other authentication protocol environments such as mobile computing.
Questions were about how PKTAPP can be a Kerberos variant, since it looks a lot like SSH. The Kerberos message format is retained. It doesn't retain any state on either side. Some people think the KDC is integral to Kerberos.
"A Model for Asynchronous Reactive Systems and its Application to Secure
Message Transmission" by Brigit Pfitzmann (Universitat des Saarlandes) and
Michael Waidner (IBM Research) was presented by Michael.
They are working towards cryptographically correct and formal proofs of cryptographic protocols. Typically you get only one or the other. In cryptography, mathematically precise and realistic definitions are possible. The definitions and proofs are long, difficult to grasp and error prone. They are missing probability spaces, adversary definition, active attacks, error probabilities, and computational restrictions. Formal methods are well defined, provide protocol languages, and some tool support is available. But, for example, the Dolev-Yao abstraction doesn't capture cryptographic realities and no realistic proofs are possible. They just have functions representing encryption and decryption; there is no looking under the covers. The goal of this work is a faithful abstraction. The abstract primitives, protocol and goals are mapped to concrete. The system consists of honest users and an adversary as interacting I/O automata. All the honest users collapsed into a single machine and communicate arbitrarily. There is a notion of time and a clock. All the adversaries are collapsed. They can take over some communication lines, which are buffered asynchronous connections. The main difference in this model is that the adversary is timing almost everything. When you compare real configuration with the ideal configuration, from the honest user's (H) point of view, the two views should be identical. The real is then as secure as the ideal. You define what H sees and can define probability spaces for the machines. View(H) is a random variable and it captures all aspects of security. There are simulatability variants that are equivalent with "guess". The general model includes collections of probabilistic I/O automata and distributed scheduling. It is implemented by Turing machines, for complexity reasons. In the security specific system model, the system is a set of structures. Configurations represent a "standard" real system's intended structure. The channel model indicates what influence the adversary has on communications. The composition theorem approaches the proofs of security in a somewhat modular way. The proof is in the proceedings. They can prove that a real system is at least as secure as the theoretic system. They can replace one system with another and get a particular level of security. The proof idea is to take systems, combine them, replace parts, and recombine. In a real system, any participant can send authentication and secret messages to any other participant. It is all scheduled by the adversary. In the ideal system, there is one single machine, TH. It has an interface to the adversary, so it can observe things. If someone is sending something to someone else it can be observed. It can send messages to itself. The adversary can select to determine which messages are actually delivered to the honest users. The real system doesn't depend on any cryptographic details. They use a cryptographic proof, then put formal methods on top of it. The overall approach (which includes previous work) provides a rigorous model and precise cryptographic semantics, abstract interfaces, deterministic specifications with tolerable imperfections. The new aspects are an asynchronous system with distributed scheduling, a composition theorem, and reactive multi-user encryption.
The What's really different session was chaired by David Wagner (UC Berkeley).
Generation from Voice" by Fabian Monrose, Michael Reiter, Qi Li, and Susanne
Wetzel (Bell Labs, Lucent) was presented by Mike.
They are trying to use voice to generate a key that can be used in applications. The adversary is someone who steals your mobile phone. Voice is a natural UI for many people and many devices. It is known to differentiate between users and there is a rich literature in speaker verification. Unlike static biometrics such as fingerprint, changing the password changes the vocalization of it, so a user can have many keys. The speaker utters a password or phrase which is used to index into a table. From what is selected based on a voice utterance, it can build a key and encrypt a file on the device. They aimed for security (the key should be resilient to offline attacks), reliability (false negative rates should be small), and efficiency (since it would be implemented on resource constrained devices such as third generation phones and PDAs). With encryption with a spoken password the password entropy is low, and even lower for pronounceable passwords. No plaintext anything is stored in the device that's speaker dependent. The work derives from a scheme that takes into account typing features and password to get a hardened password with more entropy than the password itself. They broke the key into shares with a secret sharing scheme. There is an M x 2 table (M is the bit length of the key). They use the utterance to index into the shares of the table. To pull out M features they pull out one of the 2 shares in each row. There is no additional security at first. Over time, they notice patterns or features called distinguishing features. The system can destroy the parts that the user doesn't need. The user should be able to log in, but an imposter shouldn't. Conceptually, this can be used in any context where traditional passwords used, such as file encryption. It can be used to generate private/public key pairs, for VPN access, and for email decryption. The email can only be in plaintext inside the phone. Processing the voice password starts with capturing the signal at 800 samples/second. They remove the silence, and break into windows 30ms long, overlapping by 10ms. They apply a discrete Fourier transform to take it into a frequency vs. amplitude curve. Other transforms include getting the twelve cepstrum coefficients, which do a good job of modeling the vocal track of the user. The segmentation technique they apply is similar to segmental vector quantization where segments are mapped using maximum likelihood. The approaches to generate feature descriptors include the position relative to a plane (in the paper), centroid parity and distance from centroid. If the attacker captures the device, it must reconstruct the key. It can start by cutting through the table lots of times. It's hard to tell the difference between a random piece and a real share in the table. The guessing entropy is how many vertical cuts through the table are needed before finding the right one. The attacker has perfect knowledge of how you speak and what you say. They applied utterances to a table. They analyzed 90 users saying a different set of 1-800 numbers. The guessing entropy grows with the log of the number of users.
There were questions about password content and length (in time) in the real world. They don't honestly know how secure it is. The evidence indicates it's a direction to go. Just like any biometric, if you have a high quality recording of the utterance you have all the key information. But they don't know if you can do that over a distance. They view the 2% false positive rate as too high. Someone asked, what about a dictionary attack with someone who speaks like me? If it's the same region and same sex, then what are the rates? Work needs to be done. Someone pointed out that Gus Simmons demonstrated with breaking RSA based encryption for voice that there was so much entropy one could predict what was being said.
"A Trend Analysis
of Exploitations" by Hilary Browne, William Arbaugh (UMD), John McHugh, and
William Fithen (CERT/CC) was presented by William.
Are over 90% of security incidents due to known problems? It seems anecdotally true, but how do we provide stronger evidence? The authors performed an analysis of past intrusions using the CERT/CC historical database. It took about 3 years to do. They searched the CERT summary records for key words and vulnerability numbers (automated) then reviewed the summary record and electronic mail to ensure that the data point was valid (manual). If the evidence didn't support the fact that an intrusion took place, then the record was not counted. This results in an under count. They tried to identify the number of hosts effected by each incident, but were careful not to over count. If there was no hard evidence that intrusion was from the specific vulnerability under study, they didn't count it. For example, hackers make lists of which hosts have a particular vulnerability, but these lists were not used. Intrusion reports are self-selecting. People can't report what they don't know or understand. The human element includes errors and boredom. Until recently (early) CERT records were not conducive to analysis. Also, you don't always know how you were broken into if it's a new vulnerability. The human doesn't necessarily do a thorough evaluation of the reason for the break. Hackers may choose a new kind of break, just for fun. They did not find what they expected to find. They expected a curve starting from the discovery of a vulnerability, through some intrusions, with a steep increase. Then some time after the patch is released, the patch is installed, and the intrusion rate decreases. Bruce S wrote this up in a Cryptogram newsletter. There was a very large positive skew to the data. The intrusion was discovered and patched, and still there were no intrusions. The first serious intrusions came about 3 months later, when the attack was scripted. Then it takes off. This is proof that script kiddies exist (as if it were needed). For a vulnerability that was patched in March '96, as of November '98, they're still seeing intrusions. The CERT data supports the hypothesis that well over 90% of the security incidents reported could have been prevented. Attackers have automated (scripting) and as a result react faster than the defenders. Analysis of several incident histograms indicated that the intrusions accumulated with a similar shape. They then asked, can we predict the severity? If they can find a model that fits, then they may be able to predict the severity of incidents. They used curve fitting (no statements about any potential relationship). The accumulation function is linear in nature. The initial analysis focused on examining the data on a monthly basis. This demonstrated useful results but introduced a bias (not all months are of equal length). The prediction was not useful after three months. They're looking at a daily analysis now. The regression was done after 30 days of activity, at which point they generated a slope, then plotted the rest of the intrusion set series. The prediction was accurate for about 3 weeks for one vulnerability, about 70 more days for another, and for about 30 more days for another. Then the data fell off. It appears possible to predict the severity of intrusions, but the best prediction must be dynamic (generate a new curve when the error grows too large). This could be used as a prioritization mechanism for patching (though just getting people to patch to begin with would be nice). They're working with a statistician to gain greater insights, by grouping the data better and doing multivariate regression. They are starting new analysis from the scripting date and continuing to collect more data. They are going to focus on methods to tighten the defenders decision loop.
An attendee pointed out that there is a lot of sample bias (only people who report to CERT). The data indicates that very wide spread intrusions tend to have vulnerabilities that persist for a long time. Maybe it's really the other way around. The authors are trying to find statistical significance. Someone asked if there are intrusions occurring before the discovery and announcement dates. The hacker community is fairly boastful. Someone pointed out that the data is consistent with statement: nobody patches much. The authors wanted to do traditional epidemiological study. However, with this data, if you don't know you have an intrusion, you don't figure it out. If you do figure it out, say 6 months later, all evidence of how they broke in is gone and the exact cause can't be determined. They're looking for ideas on how to quantify the data sets. Someone asked if the time of year was important. They had looked at seasonal aspects. It seemed to correspond to academic years. Questioning brought out that the authors want to start looking at probes, but they're not reported to CERT. It was pointed out that there is some anecdotal evidence that exploits fall off because they're old news to the hackers.
5 minute presentations on developing research session was chaired by Stuart Haber (InterTrust).
Joe Weiss presented "Information security needs for EMS applications". All of these systems were designed to be vulnerable; interoperable, user configurable, open, and providing as much information as possible to anyone who needed it. Backfitting technology like encryption produces denial of service problems. They have done vulnerability studies; they're vulnerable to access by people who shouldn't have it. So far, there's no need to be afraid, because hackers want to get at a web site. There are none of those on the back end of industrial processes.
Tim Levin and Cynthia Irvine (who presented) authored "Overview of quality-of-security service". Security is often treated more statically than QoS. They're interested in devising a layered distributed system where security choices get passed down through the layers, which limit the choices.
Evie Spyropoulou presented "Managing costs and variability of security services". Related to previous talk, they offer security levels like high, medium and low, which are mapped down the layers. There is a model for determining the costs for variant security. The calculation of costs can be used for an efficient schedule. There is a range within which the values are accepted that restricts the values.
David Lie (who presented), Dan Boneh, and others authored "Xom: Architectural support for copy- and tamper- resistant software". eXecute Only Memory (XOM) uses hardware or architectural support. Programs are encrypted to have a one to one relationship to a XOM machine. The set of registers has ownership tags, indicating which program owns the values in the registers. It can guard against slicing, splicing and replay attacks. They have a paper this year at ASPLOS. It may be similar to the IBM processor paper presented in '97 at IEEE S&P.
Calvin Ko presented "SHIM: System Health and Intrusion Monitoring". The goal is to detect incorrect system behavior that can evolve into security failures. The approach is to use a hierarchy of constraints that describe the system at different levels of abstraction.
Hao Chen presented "Model-checking software for security violations". Risky code will do a seteuid(0) then later do an fopen without realizing the heightened privilege you're in. They model the risky order by a state machine and have a framework to formally check for risky states. If all privileged operations were lexically scoped this problem would be less severe.
Ludovic Me presented "A language for a comprehensive description of attacks". They have an attack skeleton with descriptions from the attacker's point of view, the defender's point of view, and how to counter the attack. There could be more information to describe an attack in the future.
Dawn Song presented "Timing analysis of keystrokes and timing attacks on SSH". Each keystroke is sent in a separate packet immediately. Using a Gaussian modeling of character pairs, they used a Hidden Markov Model to infer passwords from inter-keystroke timing. Timings leak 1 bit per character pair for random passwords. It can reduce the work factor 50 times for length 8 passwords. The paper will be at Usenix Security Symposium.
Al Valdes presented "Simulation analysis of an intrusion-tolerant system". System sensors are managed by a common interface which is also distributed. They are exploring response tradeoffs as part of a larger dependable intrusion tolerant system study.
Ajay Chander presented "A role-based trust management framework". Their distributed framework takes the clustering from roles and the decentralized distributed environment, local namespaces and local say about policies from trust management. The system includes linked local roles and role delegation.
Jabeom Gu (presenting), Sehyon Park, and Jayong Ahn authored "Secure mobile multi-casting protocol and key management". Mobile nodes are serviced from the service agent in the service area that they reside in. They have simulation results.
Jeff Jianxin Yan presented "An attack on black-box traitor-tracing schemes". Every user has a distinguished private key to decrypt the contents. Users can their keys to pirates (traitors). They use a Boneh-Franklin Scheme to generate an intelligent pirate.
Tao Ye presented "The security challenges in peer-to-peer business integration". They want to extend P2P to business processes. The decentralized system (which may be offline) needs local authentication. They will have compartmentalization of the agent contents into portfolios and access control to portfolio contents. They need host security against malicious agents and agent security against malicious hosts (detection of tampering, prevention of access), and administration of credentials across heterogeneous business domain boundaries.
Michael VanPutte presented what was probably the most popular 5 minute talk, "SimSecurity". They want to produce security people who have technical skills and think outside the box. Traditional training doesn't teach or even encourage out of the box thinking. Training is boring, but games are fun! They're hiring professional gamers to produce something like The Sims Meets Matrix. You can go into god mode and see everything and see how poorly you designed it. There will be a training mode where the security instructor builds scenarios. An interactive laboratory will have a certain amount of money to build a system or attack it. It will be a complex adaptive system with evolutionary algorithms.
Larry Koved (presenting), Aaron Kershenbaum, and Marco Pistoia authored "Permission analysis for the Java 2 platform". They do security analysis via static analysis of very large programs, focusing on the precise invocation graph and data flow. They focused on the security parts and throw away the parts that aren't relevant.
Jesus Molina presented "The co-processor as an independent auditor". He wants to provide a high assurance integrity checking mechanism for COTS OS. They use an embedded coprocessor as an independent auditor for verification of embedded digital signatures. It can be either active or passive.
Fred Cohen gave an impromptu talk. He joked "A copy of my slides is available; it's already on your computer". They have 35 full time people working on interesting security projects this summer. They can emulate a whole class B network that looks good to all the vulnerability scanners. They have IP addresses that change in real time. They want to implant tags into information so know they where it's been.
The final day
started with an invited talk by Pamela Samuelson (School of Information
Management and Systems, UC Berkeley), "Reverse Engineering: A Legal Right or
Wrong?". She drew on two papers she is writing, on the law and economics of
reverse engineering and the constitutionally based interest in reverse
engineering in US to explore why reverse engineering has been legal. The
legality of reverse engineering has been challenged in a number of contexts
recently, in the context of decompilation of software. The DMCA limits the right
to reverse engineer. The US constitution may limit application of the DMCA to
some reverse engineering. Reverse engineering has long been a proper means to
obtain a trade secret. There is no right to reverse engineer a patented
invention. But the patent supposed to teach how to make it. It is OK to take
apart or analyze a purchased machine or compound; patent law doesn't reach this.
Experimental use for science is a defense. The issue didn't arise in copyright
law because the law didn't protect industrial products and the contents of the
work was revealed on the face of the product. There is a long and bitter
controversy about whether decompilation of computer programs for purposes such
as gaining access to the interface information should be lawful. Sega v.
Accolate and other cases since 92 say that this is lawful under copyright. The
copying necessary is not illegal since it doesn't threaten competition. IEEE has
been vigilant and active on the legal rights to reverse engineer software. UCITA
(if enacted in your state) may enforce shrink-wrap license restrictions on
reverse engineering and undo intellectual property (IP) rights to reverse
engineering. It's currently only a law in MD. In Iowa, a contract that says it's
under UCITA is actually under Iowa law. It is more controversial than its
proponents anticipated. (As a humorous aside, Pamela mentioned that she likes
the Italian pronunciation of UCITA; u chee' ta). Decompiling software may
inadvertently infringe a patent, even if you're not trying to get access to a
patented algorithm. There is no fair use doctrine in patent law. It may require
legislative change to establish this right clearly. The economic espionage act
of 1996 has no reverse engineering limit and makes duplication of a trade secret
a violation. It federalizes trade secrecy law. While many states have reverse
engineering clauses in them, federal law doesn't. The DMCA exception for reverse
engineering only permits it for program interoperability.
She then discussed DeCSS and shrinkwraps. Jon Johanssen bought a DVD. He couldn't play it because of country codes. He reverse engineered CSS, wrote DeCSS to bypass it, and posted the program on the net. He also hoped to help development of an open source Linux player for DVDs. DVD-CCA v Bruner held that Jon Johanssen had misappropriated a trade secret (CSS) by breaching the anti-reverse engineering clause in the DVD license. Bruner was also liable because he posted DeCSS in concert with Jon Johanssen. It is on appeal. By violating the shrinkwrap licenses, the theory is they improperly attained the trade secret. Posting it was publicly revealing it. It's hard to argue that CSS is now a protectable trade secret. She also happens to think the judge's ruling is wrong. Jon Johanssen is in Norway, was 15 years old, it's not clear that Norway's law covers this or that he's even legally able to contract. She doesn't think the Bruner case is a strong precedent. DeCSS is also affected by DMCA rules. One clause says it is illegal to bypass a technical measure used by copyright owners to protect access to protected works. Another says it is illegal to make, distribute or offer to the public a technology primarily designed to bypass access controls used by copyright owners. Another is an anti-tool provision for copy-controls. There are very limited exceptions for interoperability, encryption research and security testing. Bypassing copy control isn't illegal; making a tool to do it is. There is an exception for law enforcement. She's personally responsible for that. Maybe you don't have such a good legal right in the first place. The statute complicated. There are seven exceptions to the first clause mentioned above. Either they're meaningless totally, or there's an implied right to make a tool. Only litigation is likely to yield the answer. There is no express privilege in the exceptions to bypassing an access control for making a tool. She would like the courts to interpret that there is an implied right to make a tool.
Jon Johanssen had to bypass CSS to make DeCSS which in the US is now a violation of DMCA. In Universal v. Reimerdes the distribution of DeCSS violates the DMCA because it is a tool primarily designed to bypass CSS (an effective technical measure used by movie studios to protect access to movies). Jon Johanssen is a national hero. She was sitting with her husband in a restaurant in Norway talking about the case. The waitress and everyone around cheerfully joined in. The interoperability privilege didn't apply because it's not the "sole purpose" of DeCSS to help with a Linux player. The judge decided this. The statute only allows if it's the sole purpose. It's not clear what trying to interoperate with DVD's that are data, not programs, means. In Universal v Reimerdes, seven movie studios sued Eric Corley and others for DMCA violations because they posted DeCSS on their websites. The trial judge found that both posting and linking to DeCSS were violations. Even if Jon Johanssen had qualified for a DMCA exception Corley didn't ebcause he wasn't developing a linux play (he's just a journalist). The trial judge found no constitutional infirmities in DMCA. The court of appeals heard arguments on May 1. It's not very likely it's gong to be totally reversed given composition of that court. There are broader implications of DMCA. There is a need for comp security testing because you can't always trust what endorsements say about security. You need to reverse engineer in order to test security. You need to build tools to engage in security testing. You need to be able to share results and tools with colleagues and the research community. DMCA has an exception for security testing but it quite narrow (you have to get permission first and there are limits on sharing tools and results).
Pamela skipped over a retelling of the current status of Felten's run in with SDMI and the DMCA, because most people there were up to date on it (and Felten, Dean and Wallach were there). They didn't violate the access control clause because SDMI technical measures are not access controls. But they might have violated another clause if they made or adapted tools to bypass SDMI watermarks. It doesn't qualify for the encryption or security testing research expectations (they aren't encryption and security is defined narrowly). Publishing or delivering a paper may "provide" a component of a tool to the public and violate that clause. It is not clear that SDMI watermarks were effective technical measure within the meaning of statute. It may not be effective if it can be bypassed. It is not clear that "component" was meant to encompass a paper discussing a copying technology. It is possible that courts might rule Felten is not liable on statutory grounds. But even if congress meant DMCA to reach Felten's paper, the constitution may protect him. The 1st amendment is the first place to look and the easiest to argue in this situation. The 1st amendment generally protects a scientist's right to publish the results of lawful research (the SDMI hack itself was not illegal). It can still be enjoined if there is a national security concern. In Bernstein v US, cryptographers have a 1st amendment right to post source code on the net to communicate scientific ideas on code. Unfortunately the decision has been withdrawn, so it is not as impactful. The fact that someone might do something illegal with the information is generally not enough to stop the speech (how to make bomb). It has to be immanent, clear and present danger.
What about other DMCA violations, such as Microsoft's hack of AOL instant messages, bypassing access controls to view controversial Microsoft interface specifications to avoid contractual restrictions, and bypassing Ebay's anti spider technology? Ebay sued Bidder's Edge about their use of spiders. They were found to be a trespass on Ebays' server. What about bypassing spyware to protect privacy? Is there a need for a broader right to reverse engineer? Suppose DMCA made the SDMI hack illegal or makes other security research illegal. The 1st amendment may not suffice because much of reverse engineering is conduct. Article I, clause 8, section 8, gives congress the power "to promote progress of science and useful arts by securing to authors and inventors exclusive rights for limited times in their respective writings and discoveries". Is there an implied right to reverse engineering to promote science? The supreme court looks at history and the constitution to discern constitutional principles of significance and applies them. As regards the IP clause, it might find a disclosure and dissemination principle, a public access and public domain principle, a competition and innovation principle, and a balance and utilitarian principle. If we stop reverse engineering, it will slow down innovation. Reverse engineering has long been legal because of the beneficial effects on innovation and competition. Innovation whether in art, history, science or technology, builds o n what has gone before; incremental innovation is the norm. Other countries don't have the same constitution so can only address it at the policy level. Us "exporting" DMCA in the new EU directive is worse in some ways than DMCA. There is no encryption research exception, no national security exception; no exceptions. There is a need for the scientific technical community to help refine DMCA-like rules to ensure adequate room to reverse engineer. Congress didn't know what they were doing when they adopted DMCA. They heard piracy, and said "whatever you want". They wrote a very narrow specific exception for encryption.
There were questions involving checking of patents and copyrights and equal protection. It is illegal to circumvent technology to see if there's infringing material inside encryption. It's clear that there are not enough exceptions in the law. She would argue for a general purpose exception. Courts are pretty good at distinguishing for legitimate purposes. Even if law enforcement has to circumvent it to get evidence, it's not clear that any mutuality is implied. Courts could interpret some exceptions more broadly, to get just results. They could also say go tell it to Congress. Another question was about RIAA suggesting that the program committee might be violating DMCA by allowing a paper to be published. In theory DMCA applies to anyone who contributes in any way. If you make or distribute the tool, you're directly liable. In the Reimeras case, merely linking, said the trial court, is a violation of providing, although strictly speaking it just facilitates providing. In an ACLU brief she coauthored, they say that contributory liability is not what Congress intended. The program chair in the Felten case has withdrawn from all program committees because of fear of liability. That's a nice example of the ways in which the statute has a chilling effect on legitimate activity. That needs to be documented. Congress can do something to make it less bad if they are presented with a case. She believes Congress didn't intend for it to have this effect. In contrast with the situation with the program chair, the general chair's employer have been extremely supportive and presented no barriers. The situation has helped to galvanize some civil liberties organization. The ACLU wasn't sure how far they'd go in the Reimerdes case. If you care, make some sort of contribution to the EFF. She's on the board; they've spent tremendous resources on this. The Law school at UC Berkeley is looking for DMCA cases to defend. Get in touch with her if you have some situations.
Someone asked, if linking is a problem, what about searching? There are law suits on search engines right now. None of the safe harbors in DMCA for copyright apply. Being a lawyer sucks sometimes. You can do careful crafting of the safe harbors, and it all can get washed away. If you're a telco, and somebody uses your wires to transport DeCSS, you can't claim an exemption under the safe harbor. You're distributing and providing to the public. Common carrier doesn't apply to this sort of stuff. She thinks that courts can get these things right. In the case of Corley, after getting enjoined, they told everyone to send the URL information. It's hard for a judge not to say you're not violating my injunction. Judges don't like contempt very much (even if it is appropriate in some circumstances). [At about this time, someone I was sitting next to handed me a slip of paper with a quote from a Mae West movie: Judge: "Madam, are you trying to show contempt for this court?" Mae West: "No your honor. I'm doing my very best to conceal it." ] If we're really lucky the court will say it's contributory liability and the statute doesn't cover that. News places linked to DeCSS sites at earlier points (even the New York Times). Maybe there's a real first amendment issue. Someone asked if, in the Jon Johanssen case, did the judge say what a tool is. No. It is not an issue in the trade secrets case. Is CSS a trade secret? The judge said yes. With DeCSS, was there a misappropriation in violation of shrinkwrap? Yes. What a tool is was not an issue, though it is important in the context of Reimerdes. There was a question about what's happening outside the USA. There's no question that some sort of international cooperation in the technical community to carve out a space for legitimate activity is important. The strategy of the content industry is to get past the US Congress the broadest possible law, then use it as a model for an international copyright treaty. They could make the act say that it is illegal for the purposes of circumventing copyright. That would satisfy the treaty, but we didn't do that. US is saying to other countries to adopt our approach minus the exceptions. There are opportunities for countries outside the EU to have some economic advantages. Research labs would go there. Congress would pay attention. The opportunities are for competitive advantage, not about being a piracy haven.
Someone said that the security community went through 2 decades of hard fights against other bad security legislation. Doing it again sucks. How can we get ahead of this sort of thing? It's interesting to Pamela how much mobilization there was of the scientific and technical community around export controls and key escrow. It's difficult to mobilize those populations around digital copyright. Barbara Simons was trying. Some places had other sets of interests. The back story is that big entities with lots of legislative clout decided not to exercise in DMCA. Hollywood was too dug in and it was so clearly unconstitutional on its face. AT&T spent all its chips on the good safe harbor rules for intermediaries. Someone suggested that archives and libraries must remove information that might be used for this purpose. She wouldn't do that right yet. The courts haven't interpreted that far. It's a nice way to illustrate the overbredth. That speech was lawfully uttered years ago. You need to stand up for your rights;
it's better than being chilled. Someone asked her what would she write; where would she draw the line. It's a great problem. It's a very difficult area even though she's critical of the rules. There were rules about cable descrambler boxes. All were specific to certain domains. Part of what's upsetting about DMCA is it's across the board. The people who drafted it didn't think it had anything to do with law enforcement. She told them it could provide the basis for shutting down for the NSA. She was trying to illustrate that it's a bad law that they should try to draft it narrower. A knowledge or intent requirement that covered use for illicit infringing purposes and no other essential use would be good. A group of technologists could sit down and try to draft a law that did the right thing. At the moment, there is not a difference between a technology for fair use and one that infringes. There's not a single instance of piracy; there's no proof of infringing movies. If we get a ruling that it doesn't matter if no infringing copy made, that would be bad.
The session on Cryptographic Protocols, 2 was chaired by Avishi Wool (Lumeta Corp).
"Graph-Based Authentication of Digital Streams" by Sara Miner (UC San Diego) and Jessica Staddon was presented by Sara.
This goal is an efficient source of data authentication of multicast,
continuous feed streams. A simple solution involves a signature scheme. The
sender signs each packet individually then appends the signature to each packet.
This introduces significant overhead. With the message authentication code
scheme, the sender shares a symmetric key K with all receivers. The sender
appends a MAC to each packet. There is no source authentication because all
parties share a key so any of them can generate a valid MAC. Previous work uses
a signature on one packet. They then "link" other packets to the signature
packet using some more efficient mechanism. The authentication information is
faster to compute and verify and the packet size stays small. Linking each
packet to the one before it, the entire stream content must be known in advance,
and the loss of one packet means no later packets can be authenticated. You can
break the stream and send one signature per chunk. Schemes using MACs with
require that the entire stream is known ahead of time and that there is time
synchronization. Enhancements send the key downstream in a later packet, but
still have a loose time synchronization requirement. If it reveals the MAC key
in a later packet, the revealed key cannot be sent too soon after packet that
was MACed with it. Otherwise, an adversary could tamper with data and recompute
a MAC. Most related work uses hashes and signs one packet. They differ in the
loss tolerated. One does multiple MACs per packet; the overhead per packet grows
with the number of receivers. Selecting a scheme involves considering the
overhead per packet, the authentication delay/packet buffering at the sender or
receiver, the type and amount of packet loss tolerated (random (independent)
packet loss or bursty loss), and non repudiation. Graph based representations
are an easy way to describe many existing authentication schemes. They allow the
scheme properties to be easily measured. An edge from packet I to packet J means
if packet J can be authenticated, then packet I can be. For example, the edges
represent hashes. To determine if a particular packet can be authenticated,
remove the nodes which correspond to lost packets. It can be authenticated if
and only if a path exists in the remaining graph from the packet to a signature
node. Their idea is, to combat random packet loss, they introduce randomness
into the construction. They construct a graph such that each possible "leftward"
edge exists in the graph with probability p. The expected in-degree of node I =
(n - I) p. Without any packet loss, the probability that packet I is connected
to packet 1, depends on I, and goes up as I goes up. If packets are lost
independently, then the farther away the packet, better probability that it can
be authenticated. They then want all packets to have some minimum authentication
probability. But increasing the edge probability increases overhead. The packets
closest to a signature have the lowest authentication probability. Their idea is
to find a packet I that has a good enough probability, then increase the
probability for the nodes between I and the signature. At each non signature
node left of packet I, add new rightward edges with probability p. These new
edges may be hashes or MACs with a key revealed later. They also address a
bursty loss model, where for each packet a burst of length b occurs with
probability q. The new property is packet prioritization. The user can specify
priorities on each packet. Their scheme guarantees higher authentication
probability for higher priority packets, by protecting them against longer
bursts and more bursts. For single burst tolerance, the distance between the
outgoing edge endpoints dictates the length of a burst tolerated by a node. For
higher priority nodes the outgoing edge endpoints are farthest apart. The
original scheme can actually tolerate most long bursts, just not at the
beginning. For multiple burst tolerance, they add extra edges and space them the
same way. The user chooses the desired scheme properties and their schemes
accommodate them (within limits).
One question brought out that application profiles were not available to help applications chose a scheme. A questioner asked about losing the first signature. It can be retransmitted multiple times and they can allow for a request to retransmit. A question about error correcting codes spread within the authentication material brought out that there is a bit on this in the paper.
The final paper was "ELK, a New
Efficient Protocol for Large-Group Key Distribution" by Adrian Perrig, Dawn
Song, J. D. Tygar (UC Berkeley). Adrian presented.
Information broadcasts such as stock quotes and radio broadcasts are gaining popularity. Information distributors want to deliver only to paying receivers. Today they use secret key encryption and large scale key distribution. With group key distribution, the key server distributes keys, and receivers join/leave at any time. They require forward and backward secrecy. The broadcast group can be re-keyed after each join/leave. The challenge is scalable re-keying. There are backward secrecy and forward secrecy issues with joiners and leavers. There is a lot of prior work. Except for Keystone, it didn't look at the reliability of the key update messages, which is crucial. The problem is that the re-key broadcast message may get lost. A unicast recovery protocol is not scalable for very large groups. For Reliable broadcast (neighbor recovery), scalability is an area of active research. Keystone used forward-error correction that assumed packet loss in the Internet is independent. It's often correlated; the probability of a subsequent packet loss after a packet loss is higher. Unicast key recovery is slow (>100 ms RTT). A workstation can do 10 to 6 one-way functions in 100 ms. You can trade off computation with communication to get reliability and lower communications cost. A key tree is used as in most previous work. Each member is a leaf node. All members know the group keys above them in the tree. ELK provides a broadcast free rekey on joins. They begin by assuming that a receiver can do 2^S pseudo-random function (PRF) operations quickly while the attacker cannot do 2^2S PRF operations. In the paper they generalize to an attacker of arbitrary strength. On a leave, they update all the keys up the tree from leaver. The function at the parent depends on the two child keys. The parent picks a random key for the closest one that needs to change and propagates the changes up. The parent key takes two contributions from two keys, using the child keys and applying them to the previous parent's key. The contributions are S bits each. The key server broadcasts contributions encrypted with the unchanged keys and does key distribution with a hint. The hint key update is compressed to S bits. They apply a one way function to the key to distribute it and members recompute the key using the hint. For the problem of a lost rekeying message, they add a small key hint to data packets. This trades off computation with communication. The receiver does 2^S extra operations and the rekey broadcast is S bits shorter. The receiver reconstructs the key faster than contacting the key server. You can fine tune the security level by adjusting S. It is robust to precomputation attacks. It has reliability and low communication overhead. Want to analyze the reliability in real world settings.
Questioning indicated that they were going to bring their work to IETF. There was a question about the rate of joins and leaves in reality. For joins, ELK will tremendously help, since they are quite cheap. Leaving is much more difficult. You can aggregate leaves and the key update will become smaller. If it's a highly dynamic group with 1 million members, scalability is still a challenge. A question wondered about its applicability to subscription based services. Other questions centered around its utility in the face of traitors and credit card payments. Is there a great deal of urgency to get a millisecond vs. one hour turnaround. Aggregation is discussed in the paper for just that reason.