Big data frameworks: Making their use in enterprises more secure
Many enterprises apply big data techniques to their security systems. But are these methods secure? Expert John Burke explains some of the efforts to secure big data analysis.
In a recent report, Nemertes Research found that 23% of organizations surveyed were deploying big data frameworks in their security operations. Why? Because they are drowning in security data already and are not even looking at everything they could. A decade ago, the increasing volume and complexity of firewall, router and other security log streams led to the invention of security incident and event management (SIEM) systems. But before most organizations even got a SIEM in place and tuned to their needs, the number and volume of security logs streams began to outstrip the capacities of the tools for assessing them. Consequently, the promise of SIEM systems was undercut by a narrowing focus: IT was able to push more and more data through them, but that growing stream of data represented less and less of the total possible universe of security data IT could put through them.
When data got too big
This course of events led in turn to applying then-emerging big data techniques to security. At first, most security systems that used big data frameworks were homegrown and based on Hadoop. The goal, as with SIEM, was to run analyses that would, by looking across all the available streams of data, allow security staff to improve their overall security posture.
But who or what protected the data and systems used for this analysis? In most cases, no one and nothing. Data was amassed on generic server images used as part of a Hadoop farm with no special controls, as though the data they contained was of low risk and little importance.
However, the data amassed should be expected to be sensitive at least -- and in some cases confidential and critical. At a minimum it should include logging data for who is using what, from where, when and perhaps even what are they doing while logged in. It could easily come to include, depending on logging settings and on how widely the net is being cast, snippets of ordinarily protected data: social security or credit card numbers, employee or customer addresses and so on. As sensor nets spread and are connected, IT could even wind up collecting information about who is located where in a building (e.g., from card key systems). And, of course, data lakes that can show you how people are breaking into your organization could just as easily show bad guys how to do so.
Security comes to big data's rescue
So, in a sense, security had to come to security. At a minimum, security staffs moved the data lakes and analysis into protected subnets firewalled off from prying eyes. They locked down server images more tightly. All of which was necessary and, for a while, about as much as one could do. Early versions of Hadoop, for example, had essentially no security at all. Anyone could submit a job, look at whatever data was stored on a node or register a process as a system service. Gradually, security features arrived -- auditing, access control, encryption of data streams, and segregation of data and jobs (so users couldn't preemptively modify the parameters of other users' jobs) -- all anchored by Kerberos.
Support and services providers for big data frameworks, as well as security vendors, found and continue to find flaws to patch and gaps to fill. Encryption of data at rest, for example, didn't arrive in the Hadoop standard until late in 2014; before that, vendors such as Hortonworks and Cloudera supplied that function as a value-added feature. Vendors including HP and IBM created security add-ons for their big data frameworks to try to make enterprise use of big data more secure in general. Specifically, they aimed at increasing security not just in the context of Hadoop and not just for managing security information, but for both data in transit and for data at rest (e.g., by providing payment card industry-compliant controls).
Securing big data frameworks, including in security, is an ongoing journey. Although big data security has advanced enormously in the last eight years, the platforms, data feeds and use cases continue to evolve rapidly enough that new security challenges will continue to arise for many years to come. IT security staffs need to pay close attention to securing their analytics platform just as they do the rest of their security tools.