Grafvision - Fotolia
Why data fidelity is crucial for enterprise cybersecurity
Cybersecurity teams can't be effective if they don't trust their data. Expert Char Sample explains the importance of data fidelity and the threat of cognitive hacking.
The Oxford Dictionary's word of the year for 2016 was post-truth. The post-truth environment causes each of us to question the data we receive for processing. This environment is the output of cognitive hacking -- a term that has also been referred to as brain hacking.
The previous article on this topic discussed how cognitive hacking could be used in various environments, including the virtual environment that defines cybersecurity.
Cognitive hacking has been historically linked to political or financial environments, but the larger problem of data fidelity exists in all environments. There are several reasons for this phenomenon, and they require a very brief discussion since they can predict the success or failure of the potential solutions. These factors help to shape the problem of ensuring data fidelity and, ultimately, how this problem must be addressed.
First, we must consider the role of data in the environment. Data represents the entity or the glue that joins the edges of technical and human components. While much research has been conducted in the human and technical domains, along with human-machine interface research, the actual value of data remains mostly unexamined.
The value of data is both a technical and human issue. On the technical front, integrity and privacy address the transmission of data and even the maintenance of data at rest; however, simply capturing the data on input and valuing the data is not sufficient. Data exists in -- and reports on -- an environment; therefore, in order to know the value of data, the data must be understood in context. This context relates to when the data is created.
Data fidelity and cybersecurity
When discussing cybersecurity data, the environment where the data is created varies depending on the location and type of data. For example, consider an intrusion detection system (IDS) alert. An IDS alert can occur on the network (on the wire) or on the actual host. In both cases, the alert is trusted by the software that processes it.
However, the actual event may not match the reported event. If, in addition to the event information, other environmental variables such as memory usage and CPU cycles are captured, the contextual information around the event can be viewed and can provide the necessary data to possibly assure the fidelity of the event data.
Using the example of IDS data, if the network IDS systems are saying all the connections are normal, but the traffic sessions appear to be larger than normal, or the number of connections is significantly greater than normal, then we can infer that something is not right.
Presently, this does not differ much from basic anomaly detection. However, when the data is compared against the normal variance of baselined data, or when the data is compared against existing security information event management (SIEM) data and a difference is found, we can question the fidelity of our SIEM data.
Next, consider a case where the host is involved. Perhaps the file system remains acceptably changed, but the normal processes are taking longer than normal and using additional CPU cycles. In this case, like the previous example, a perturbation has occurred in the environment.
Data fidelity in IoT environments
In other cases, the environment may appear normal, but the object or alert may have changed, suggesting a potential false alarm; this is a common problem with anomaly detection systems. In this case, by having the object and environmental variables coupled together, we can potentially reduce the number of false positives. Minimally, we can gain better insights into the false positive problem without introducing false negatives.
Gathering the necessary environmental variables requires a deep understanding of the various environments. When dealing with the internet of things (IoT), this problem becomes considerably more complex. While the actual embedded device environment is simple, the variety of host environments where the chip resides introduces many new variables that require observation.
In spite of this, there are common areas that provide the basis of observables. For example, messaging rate baselines can be observed and established for various states, recognizing that certain stressed states are suboptimal for automated actions.
Table 1 is applicable to each of the three environments -- network, host and IoT -- as well as each of the meta-states that these inhabit at various processing times. The meta-processing states for each of the environments are start up, idle, normal processing, stressed processing, degraded processing and shutdown. Each of these meta-states has associated variables that can be accessed and examined to determine if these are controlled events, anomalous events or potential problems.
Of course, variations occur in each state of each environment, and this is why baselines are important. Baselining is sometimes thought of as the custodial work of cybersecurity, which is why many have tried to use machine learning for the task. This resulted in machine learning algorithms getting a flawed reputation because of their tendency to incorrectly classify abnormal data as normal, an ever-present problem that can be overcome through proper baselining.
Conclusion
The data fidelity problem will likely continue to plague security software in part because of the human-machine trust relationship, as well as because of the fundamental assumptions made by security experts and product developers in the early days of internet security. These assumptions, much like the signature-based detection model, will continue to be invalidated until data resilience, a significant component of data fidelity, moves to the forefront of the security discussion.
In the fake news environment of today, we see evidence of this forward movement with the inclusion of geolocation data (an environmental variable) in the context of a proposed news story being examined by machine learning technology. We will likely see a slow introduction of contextual evaluation in support of data fidelity in the IoT area first, in part because of the nature of the environmental variables and the need to better understand the interaction.