Introduction to big data security analytics in the enterprise
Expert Dan Sullivan explains what big data security analytics is and how these tools are applied to security monitoring to enable broader and more in-depth event analysis for better enterprise protection.
A significant portion of information security efforts go into monitoring and analyzing data about events on servers, networks and other devices. Advances in big data analytics are now applied to security monitoring, and they enable both broader and more in-depth analysis. In many ways, big data security analytics and analysis is an extension of security information and event management (SIEM) and related technologies. However, the quantitative difference in the volumes and types of data analyzed result in qualitative differences in the types of information extracted from security devices and applications.
Big data security analysis tools usually span two functional categories: SIEM, and performance and availability monitoring (PAM). SIEM tools typically include log management, event management and behavioral analysis, as well as database and application monitoring. PAM tools focus on operations management. However, big data analytics tools are more than just SIEM and PAM tools coupled together; they are designed to collect, integrate and analyze large volumes of data in near real time, which requires several additional capabilities.
Like SIEM, big data analytics tools have the ability to accurately discover devices on a network. In some cases, a configuration management database can supplement and improve the quality of automatically collected data. Integration with third-party security tools as well as integration with LDAP or Active Directory servers are other must-have features of big data analytics. Support for incident response workflows varies among SIEM tools, but are essential when working with big data volumes of logs and other sources of security event data.
Five key features distinguish big data security analytics from other information security domains.
Key feature 1: Scalability
One of the key distinguishing features of big data analytics is scalability. These platforms must have the ability to collect data in real or near real time. Network traffic is a continual stream of packets that must be analyzed as fast as they are a captured. The analysis tools cannot depend on a lull in network traffic to catch up on a backlog of packets to be analyzed.
It is important to understand that big data security analytics is not just examining packets in a stateless manner or performing deep packet analysis. Although these are important and necessary, it is the ability to correlate events across time and space that is a key differentiator of big data analytics platforms. This means the stream of events logged by one device, such as a Web server, may be highly significant with respect to events on an end-user device a short time later.
Key feature 2: Reporting and visualization
Another essential function of big data analytics is reporting and support for analysis. Security professionals have long had reporting tools to support operations and compliance reporting. They have also had access to dashboards with preconfigured security indicators to provide high-level overviews of key performance measures. Once again, both of these existing tools are necessary but not sufficient to meet the demands of big data.
Visualization tools are also needed to present information derived from big data sources in ways that can be readily and rapidly identified by security analysts. For example, Sqrrl uses visualization techniques to help analysts understand complex relationships in linked data across a wide range of entities, such as websites, users and HTTP transactions.
Key feature 3: Persistent big data storage
Big data security analytics gets its name because the storage and analysis capabilities of these platforms distinguish them from other security tools. These platforms employ big data storage systems, such as the Hadoop Distributed File System (HDFS) and longer latency archival storage. Back-end processing, meanwhile, may be done with MapReduce, a well-established computational model for batch processing. While MapReduce is highly resistant to failure, it is at the cost of I/O-intensive processing. A popular alternative to MapReduce is Apache Spark, a more generalized processing model that utilizes memory more effectively than MapReduce.
Big data analysis systems, such as MapReduce and Spark, address the computational requirements of security analytics. Long-term persistent storage, in the meantime, typically depends on relational or NoSQL databases. The Splunk Hunk platform, for instance, supports analysis and visualization on top of Hadoop and NoSQL databases. The platform sits between an organization's nonrelational data stores and the rest of its application environment. Hunk apps integrate directly with data stores and do not require jobs to be moved to a secondary in-memory store. The Hunk platform includes a range of tools for analyzing big data. It supports development of custom dashboards and Hunk apps, which can be built directly on top of an HDFS environment, as well as adaptive search and visualization tools.
Another key feature of big data security analytics platforms is intelligence feeds, where established vulnerability databases as well as security blogs and other news sources are continually updated with potentially useful information. Big data security platforms can ingest data from a variety of sources, deduplicate threat notices and correlate information from their own custom data-collection methods.
Key feature 4: Information context
Since security events generate so much data, there is a risk of overwhelming analysts and other infosec professionals and limiting their ability to discern key events. Useful big data security analytics tools frame data in the context of users, devices and events.
Data without this kind of context is far less useful, and can lead to higher than necessary false positives. Contextual information also improves the quality of behavioral analysis and anomaly detection. Contextual information can include relatively static information, such as the fact that a particular employee works in a specific department. It also includes more dynamic information, such as typical usage patterns that may change over time. For example, it may not be unusual to have a large volume of queries on a data warehouse on Monday mornings, as managers run ad-hoc queries to better understand events described in their weekly reports.
Key feature 5: Breadth of functions
The final distinguishing characteristic of big data security analytics is the breadth of functional security areas it spans. Of course, big data analytics will collect data from endpoint devices; that is any device that is connected to a TCP or IP network via the Internet. This includes anything from laptops and smartphones to Internet of Things devices. In addition to physical devices and virtual servers, big data security analytics must attend to software-related security. For example, vulnerability assessments are used to determine any possible security weak points in the given environment. The network is a rich source of information and standards, such as the Cisco-developed NetFlow network protocol, which may be used to gather information about traffic on a network.
Big data analytics platforms can also use intrusion detection products that analyze system or environment behavior in order to spot possible malicious activity.
The differences of big data security analytics
Big data security analytics is qualitatively different from other forms of security analytics. The need for scalability, tools for integrating and visualizing diverse types of data, the increasingly importance of contextual information, and the breadth of security functions that must be supported in big data security analytics are leading vendors to apply advanced data analysis and storage tools to information security.