Data-Driven Security: Analysis, Visualization and Dashboards
by Jay Jacobs and Bob Rudis

Wiley 2014.
ISBN ISBN 978-1-118-79372-5
amazon.com USD 39.89
Table of Contents: http://www.wiley.com/WileyCDA/WileyTitle/productCd-1118793722.html

Reviewed by  Richard Austin   May 22, 2014 

The recognition that our security infrastructure is creating a vast ocean of data that is largely untapped is, I'm glad to see, starting to gain the attention it deserves. We all know that this data must somehow be sifted, correlated and transformed into information that is presented to decision makers in a timely fashion to support better decision-making. However, just how that sifting, etc., is to be done is a perplexing question that is far too often "answered" by our going on a shopping spree without first understanding what we're shopping for (e.g., buying a "security dashboard" without first defining what information that dashboard should present to which audience and in what form).

But, how are we to define requirements when we're not sure what is even possible? Perhaps we don't even know what kinds of questions have answers lurking in our sea of data.

Jacobs and Rubin assert that we can learn a lot by leveraging freely available tools such as R and Python to explore the data we have, construct visualizations that reveal relationships and prepare dashboards that effectively communicate the results of our analysis. They earn many kudos by opening with the observation that a data analysis adventure should always begin with a question. In other words, analysis is not done for the sake of analysis and that principle is a powerful antidote against producing yet another collection of pretty pictures and glitzy web pages that look very nice but tell us nothing we want to know.

The presentation is very much "learn by doing" and guides the reader through analyzing security-relevant data such as AlienVault's IP Reputation database, Symantec's data on ZeroAccess infections, and Verizon's VERIS Community database of data breaches (VCDB). Readers can either copy the relevant code from the text (recommended) or download it from the book's website. As each analysis is carried out, the authors provide background, high-level introductory material on the methods behind the code and sage advice on why things are done in particular ways. The writing is lively with a touch of whimsy that keeps the reader engaged (e.g., would you have suspected a significant correlation between UFO sightings and ZeroAccess infections?). The book is printed in color which really brings the graphics to life and enables the authors to explain how color selection, palettes, etc., contribute to visual impact.

A core strength of the book is its emphasis on communication as the goal of the process. A superb technical analysis is worse than useless unless it is meaningfully communicated to the person that can act on its results. Chapter 10, Designing Effective Security Dashboards, is easily worth the price of the book with its exploration

of the challenging world of the ultimate one-page summary. The book's shortcoming lie in its wide scope. The depth of presentation had to be limited to keep the book to a reasonable length (somewhat over 350 pages). The authors compensate by providing a good set of references for each chapter to support further study and a closing chapter with links to in-depth background. Unless you have strong skills in Python or R, you will need a good book on the language to help you branch out from the canned examples into your own data. For R, I recommend Jared Lander's "R for Everyone: Advanced Analytics and Graphics" (ISBN 978-0-321-88803-7).

This is a technical book that will require a solid investment of time and effort to work through (including the external references that match your interests). However, at the end of that investment, you will have developed an appreciation for how data can be transformed into information using freely available tools and, most importantly, how to use graphics in effectively communicating what you have learned from that information. Definitely a recommended read whether you use these techniques in production or as guidance in evaluating commercial tools.


It has been said "Be careful, for writing books is endless, and much study wears you out" so Richard Austin (http://cse.spsu.edu/raustin2) fearlessly samples the wares of the publishing houses and opines as to which might most profitably occupy your scarce reading time. He welcomes your thoughts and comments via raustin at ieee dot org