Getty Images/iStockphoto

Tip

4 types of simulation models used in data analytics

Combining different types of simulation models with predictive analytics enables organizations to forecast events and improve the accuracy of data-driven decisions.

Simulation models are finding new uses as organizations delve into predictive analytics and data-driven decision-making.

Most data analytics techniques had their start with gambling games. For example, you might want to determine the likelihood of rolling a total of 14 with three six-sided dice -- the basis for binomial or normal distributions -- or know your odds in roulette or poker. Such games are essentially simulations, and the goal of data analysts is to create a simplified model to determine the behavior of complex systems.

Such simulations have become the only way to solve complex real-world problems in biology, physics, economics and other domains with many interacting components. Data analytics professionals should know these four types of simulation models:

  • Monte Carlo method.
  • Agent-based modeling.
  • Discrete event simulation.
  • System dynamic modeling.

These four types of simulation models underlie a great number of games, visual and audio synthesis techniques, machine learning algorithms, processing kernels and controller systems. Simulations can test systems virtually before an organization commits to a decision or design.

Monte Carlo method

In many simulations, it is difficult to determine whether the selected variables and the distributions of data from those variables represent the model in question. The name Monte Carlo comes from roulette, a game made famous at Monte Carlo resorts. The roulette wheel has 37 slots numbered 0 to 36, with 18 red slots, 18 black slots and one green slot. Players have a 48.65% chance of getting a red vs. black slot and a 2.7% of a green slot (the 0). The three chances represent one distribution.

Simulations can test systems virtually before an organization commits to a decision or design.

Any individual spin results in a random value. Repeat the same process 1,000 times or more and the distribution of results should follow those percentages. If it doesn't, other variables could be at work, such as a pedal that an unscrupulous dealer uses to slow down the wheel.

One of the oldest known examples of the Monte Carlo method is in its use to calculate the value of pi. This can take millions of data points to get there, which points out the limitations of Monte Carlo simulations: They are usually not that efficient.

This kind of simulation is often used with Bayesian analysis, which relies upon prior findings to determine the likelihood of an event occurring. Political analysts often use this technique, where polls generate a set of variables that can then be aggregated to create a model, with Monte Carlo methods used to test the model. Ensemble modeling for weather events also uses Monte Carlo, for example, to determine the likely path of a hurricane.

Agent-based modeling

Anyone who has watched a flock of birds take off has seen seemingly random initial behavior give way to a synchronized activity, with birds flying in a distinct formation even if no one bird controls their activity. Birds in flight have developed simple rules that tell them what to do based on what they see around them. Each bird avoids obstacles as it flies, and adjusts its position, in real time, based on the location of birds around it.

In systems dynamics, these birds are agents, and the moves they make are emergent behaviors. These behaviors take place in reaction to a discrete set of rules based on what other agents do. The process of identifying what those rules are is called agent-based modeling.

Agent systems were studied in the 1960s as one of the earliest examples of cybernetics and are still significant. For instance, the traffic on a typical busy highway can be difficult to model via computers. Instead, many modelers simulate each car as an agent that generally follows a set of rules, but with periodic hiccups to see how cars act in the aggregate.

Agent systems are also used with IoT devices and drones. These devices are not dependent on coordinating activities though a central processor, which creates latency and bottlenecks through complex processing. Instead, they react to their nearest neighbors. They check in with the central controller only when they get ambiguous information, or put themselves into a safe mode if they cannot interact either with neighbors or with the central controller.

This interaction scenario is the downside to the agent system. An outage or similar disruption between a small number of agents can propagate quickly. This phenomenon has caused major power outages that are difficult to recover from, because the cause of this event (everything going offline) is due to emergent behavior in autonomous power stations. In the process of rebooting, the problem that led to the outage may get resolved without indications of its cause.

Agent systems can be simulated, with software objects replacing hardware ones. Cellular biology, for instance, lends itself well to agent-based modeling, as cell behavior tends to influence nearby cells of varying types.

Discrete event simulation

Related to agent systems is the notion of cellular automata, made famous by James Conway in his Game of Life in the 1970s and later by Stephen Wolfram of Mathematica fame. Both technologies underpin transformational filters and kernels used in both image processing and machine learning.

Such systems are examples of discrete event simulations. In these simulations, time is broken up into distinct steps or chunks rather than being continuous, with the model's state at each step and then a function of the model at the previous steps.

In these simulations, stable or quasi-stable components emerge without explicit programming.

Data analysts use discrete event simulations in areas where proximity determines a grid's state or space. For instance, most weather modeling systems take advantage of voxels -- three-dimensional cells -- to determine the inputs and outputs to each cell based on previous states. In theory, the finer the mesh used to describe the map, the more accurate the results. Corrections need to be made to the model to account for the shape (or topology) of the mesh. Triangular or hexagonal meshes are more accurate than rectangular ones.

System dynamic modeling

In an ideal mathematical world, it should be possible to describe the world with independent functions, meaning that they can be treated as if they were linear. In reality, most variables that describe systems are coupled with one another -- changing the value of one variable may change another variable due to their interaction. These are nonlinear systems derived from differential equations.

With computing, we can solve such equations numerically using difference equations. Difference equations use discrete mathematics to find specific solutions that can then be generalized through building up ensembles of solutions.

A good example of such a system is predator-prey simulations. In the simplest case, there's prey, and the number of prey animals increases until their food runs out. At that point, the prey population drops to a level where its food supply can recover. Add a predator to the mix, however, and things get more complex. The prey is now coupled to two variables: its food supply and the number of predators that will kill prey animals. The population of all three species becomes nonlinear and somewhat unpredictable, even chaotic. These equations are known as Lyapunov equations, which also describe many economic models and fluid and airflow dynamics equations.

System dynamic modeling (SDM) studies chaotic systems. It relies on discrete event simulation and numeric methods to determine the behavior of components within that system. Beyond Lyapunov solutions, SDM is also used in high-density particle simulations -- for instance, modeling the behavior of a galaxy based on the forces acting on idealized versions of stars. Chaotic systems give rise to fractals, which are fractional dimensions often associated with iterative, recursive structures and emerging behaviors.

Dig Deeper on Business intelligence management