Causal deep learning teaches AI to ask why
Most AI runs on pattern recognition, but as any high school student will tell you, correlation is not causation. Researchers are now looking at ways to help AI fathom this deeper level.
Deep learning techniques do a good job at building models by correlating data points. But many AI researchers believe that more work needs to be done to understand causation and not just correlation. The field causal deep learning -- useful in determining why something happened -- is still in its infancy, and it is much more difficult to automate than neural networks.
Much AI is about finding hidden patterns in large amounts data. Soumendra Mohanty, executive vice president and chief data analytics officer at global IT services company L&T Infotech, said, "Obviously, this aspect drives us to the 'what,' but rarely do we go down the path of understanding the 'why.'"
The implications for this distinction could be significant. Ultimately, creating machines that mimic human intelligence will require training AI to ask why one observation affects another. This is why many researchers are now turning their attention to the question.
The excitement in the field has been kindled by Judea Pearl, a professor at UCLA, who did some of the formative work on implementing Bayesian networks for statistical analysis. More recently he has been developing a framework for diagramming causation and teasing apart the factors that contribute to observed events in a computable framework.
One of the biggest challenges in analyzing causality is changing the paradigm to one in which experts assert a subjective opinion on the cause of observations and then tease it apart by various analytic techniques. This is in stark contrast to the more objective approach pursued by statistical machine learning. In the long run, causation research could lead to better models for understanding the world. In the short run causal analysis will make it easier to explain why machine learning models deliver a particular result.
Overcoming magical thinking
Jake Freivald, vice president of marketing at Information Builders, said, "Savvy business leaders usually don't trust black boxes, but there has been an unusual amount of magical thinking about AI." Business leaders are starting to realize that handing their business processes over to an AI algorithm could be like letting their toddler take the wheel of their car, he said.
The problem is that analytics and AI are primarily used to find correlations in data sets. Since correlations only hint at causation, these correlations can't help you understand why something happened -- and if it can't do that, it can only tell you the probability of what will happen next.
"The more we can tease out causation in our models, the more we can be reality-based in our assessment of why things happened and what will happen next," Freivald said. "Until then, putting our businesses in the hands of an AI model could work extremely well, until it doesn't, and then the results could be disastrous."
Beyond curve fitting
Curve fitting has done well in answering important questions like "What's the next best offer?" "Is it fraud?" or "Is it a cat?"
"But, in the real world there are a whole range of questions that just can't be answered through curve fitting," Mohanty said. If there are several factors that can predict a preference for a product, which ones should the business try to influence and in what order of importance? Simply ranking the strength of different variables on their ability to predict the target is not the same as selecting those that are independently predictive and evaluating their relative contribution to the outcome.
"We can observe correlation but that does not prove or even imply causation," Mohanty said. Questions answered by causation are "what lever should I pull to effect change?" or "what would happen if I changed some underlying assumptions of the model?"
The techniques of causal deep learning, also known as Structural Equation Modeling (SEM), have existed for many years. "However, the techniques are more or less confined to academia and research and we have not seen these techniques translating to commercial or business use cases," Mohanty said.
Monte Carlo simulations, Markov Chain analysis, Naïve Bayes, and Stochastic modeling are a few techniques that are used today but they barely scratch the surface of causality. There are also a few open source packages like DAGitty, a browser-based environment for creating, editing and analyzing causal models, and Microsoft's DoWhy library for causal inference. But these are also developing.
Bottling rules of thumb with AI
At a high level, AI applications execute a series of actions based on observed patterns, said Richard Schwartz, CEO and president of Pensa Systems, the maker of an autonomous inventory management system. Deep learning uses statistical techniques to discover patterns. A different approach to embedding causal understanding in AI involves developing rules-based systems. This approach forms conclusions from other types of objective facts, such as "turning right three times is the same as turning left."
Rules can be causal or cognitive and help model the outcome from the input, but they have their drawbacks. "Causal rules are hard to come by, and even when you do define them, they tend to be more brittle," Schwartz said.
A potential solution lies in a combination of the two approaches -- for example, creating explainability for neural networks. This type of causal deep learning involves building, in a more painstaking manner, cognitive models of how conclusions are reached.
Another causal AI technique gaining prominence is an of reinforcement learning called learning from demonstration. This approach effectively shows a computer examples of how something can be done and lets the computer attempt to adapt that technique to its own problem-solving.
Pensa uses both flavors of AI in its inventory management tool to solve problems related to restocking inventory on shelves in stores. The company's main product uses neural networks to interpret computer vision input from cameras and items on the shelf (e.g., Heinz ketchup) and how the shelf is organized (e.g., Heinz is normally next to Hunt's).
It also uses causal models to generate automated prompts, such as 'Heinz is running low,' or 'Heinz has run completely out.' To reach that conclusion, the system needs to not only the product but also rules related to what needs to be on the shelf and what it means to restock.
People are quite good at cognitive conclusions, such as developing rules thumb, which allow them to form conclusions. "Pensa bottles that with AI," Schwartz said.
Model-free causality
Scott Niekum, an assistant professor of AI at the University of Texas at Austin, said reinforcement learning is inherently causal, in the sense that the agent experiments with different actions and learns about how they affect performance through trial and error. This type of learning is called "model-free" and is popular because it can learn positive or effective behaviors without having to learn an explicit model of how the world works.
In other words, it is only learning about the causal relationship between actions and performance, rather than how actions affect the world directly. For example, this might involve learning that flipping over a full water bucket above a fire puts it out, without understanding the relationship between water and fire.
Model-free learning is a double-edged sword. Without a model, the agent may have to learn from scratch about how to achieve its goals if the problem changes at all.
In the earlier example, if the agent was given a hose instead of a bucket of water, it would not know what to do with it without learning from scratch, since it did not learn the causal relationship between water and fire, but only the relationship between the "flip bucket" action and the goal of putting out the fire.
Niekum said, "For these reasons, there is growing interest in model-based reinforcement learning, though it has its own challenges. For example, how do you measure confidence in your model, what do you do when the model is wrong, and how do you manage uncertainty when attempting to over longtime horizons?"
Explaining for ML models
At the heart explainability is the notion that explanations must be able to identify and quantify all those factors that are causally responsible for a deep learning model's behavior. In this regard, causality refers to the model function itself, and not the task that the model is addressing, said Ankur Taly, head data science at Fiddler Labs, which offers an explainable AI engine.
Faithfully explaining deep learning models is challenging due to their complexity. This makes it hard to analytically reason about the importance of each feature in the model function. Earlier approaches to causal deep learning worked around this challenge by observing the model's predictions on a data set, and fitting a simpler, interpretable model to it to obtain explanations.
"Unfortunately, such methods are susceptible to the well-known pitfall of inferring causal relationships from observational data," Taly said. One cannot tease apart features that are truly causal to the model's prediction from those that are correlated with it.
Recently, a different set of methods based on the Shapley Values from cooperative game theory have emerged. These methods probe the models with counterfactual inputs. However, Fiddler's research has found that most of these methods can lead to bias if the data set is skewed. Taly said they are working on approaches to decouple model explanations from any specific data set.
This kind of research could help to identify spurious correlations that a model has learned to rely on. For example, hackers recently demonstrated the ability to fake out the Cylance antimalware engine by adding in certain kinds of data. A good step in mitigating this risk is to ascertain the causal features that significantly affect the model's prediction.
"One can then study these features to check whether they are also causal to the task, or if they can be exploited by an adversary, like in the case Cylance," Taly said.
Teaching AI superstition
Today, humans can do a better job than AI at guiding the deep learning process toward a modeling cause-and-effect relationships, Information Builders' Freivald said. This can involve limiting the data sets, taking out fields that might cause bias, and generally shaping the learning process. Humans focus on causality, while algorithms do the learning. There's a feedback loop, but the human aspect is essential.
If causality can be determined by an AI tool, then the AI can shape the learning process instead a human doing it. In theory, the AI could start working with arbitrary data sets, determine causality and apply learning in ways that would have been completely overlooked by humans.
There are a lot questions about this at the moment. Humans apply generalized intelligence to tasks, which is something machines are not yet capable doing. Recent attempts to do so have created complications. "The more general we want an AI to be, the more data it's going to need, and the greater the possibility there is for false positives -- machine superstition," Freivald said.