Best practices for threat modeling service mesh, microservices
In microservices and service mesh environments, communications don't follow static paths. As such, security teams must update their application threat modeling methods.
Security professionals have probably noticed that containers, such as Docker and Rkt, as well as container orchestration -- for example, Kubernetes -- are gaining traction in a big way. This is because, as developers have discovered the power of microservices, they are moving away from monolithic or tightly coupled component design architectures and moving instead toward more decoupled models -- i.e., models where REST is used as a layer of abstraction.
This offers several advantages from a developer point of view. First, it scopes out the implementation details of components communicating data from point A to point B. That's a huge win already, but it also lets developers create versions of an interface, enabling them to add new functionality to an API in support of future development without the need to recode, recompile or alter the existing code base. This is one of the most powerful aspects from a development point of view of a microservices architecture -- a software architecture strategy where functionality is modularized into small, discrete units of functionality, all accessible to each other through RESTful APIs.
As with anything, scale makes this complex. Imagine there are 500 RESTful APIs. Each API has a unique URL within the environment and is, more or less, agnostic about the underlying implementation details of the other services in the application that enables the overall application to function. Say, though, that either a broad change needs to be made to services en masse -- for example, to enable TLS mutual authentication everywhere -- or a targeted but disruptive change needs to be made to one service that affects the behavior of every other service -- for example, changing the URL of an often-used service. Does every other service need to be changed? Do configuration changes need to be made to the containers those services are hosted on? These are exactly the kinds of things that microservices are designed to help prevent in the first place.
One technology that helps address the challenges in scale is service mesh. Service mesh introduces a method to enable changes to service operation without having to change underlying code. This is done usually using a sidecar proxy -- i.e., a container image that lives next to the service that is responsible for directing traffic from that service. In this model, because services are talking through a proxy rather than communicating directly with each other, it provides a hook where designers can make changes to how that communication happens. Want to move services around? Sure, that can be done. Want to selectively turn off or on mutual authentication, rate limiting or HTTP headers? Those can be done too. And they can be completed without changing application code or the configuration of the container where the service lives.
New challenge: Threat modeling service mesh
These benefits turn the operations side of a service mesh deployment from a nightmare to something more manageable. However, it does have a byproduct of making application security analysts' lives more complex due to challenges in threat modeling that come about when a static pathway through the application can no longer be presupposed.
For those unfamiliar with application threat modeling, the idea is to decompose an application into its components parts, look from an attacker's point of view at the interaction points between those components and then systematically map out how specifically each interaction can be misused to gain leverage. To do this, security pros often start by making a data flow diagram -- a document that maps out how data is sent through an application and the components it interacts with along the way. From there, security pros look at specific types of misuse cases that can affect each interaction. For example, each element could be systematically worked through a mnemonic threat model, such as STRIDE -- spoofing, tampering, repudiation, information disclosure, denial of service, elevation of privilege -- to determine how an attacker might target each component.
The challenge with this approach, however, is that under a microservices model generally -- and service mesh specifically -- the pathway through the application won't stay static over time. An application may work a particular way today, but it might work differently two weeks from now -- for example, having different security controls or talking to different APIs.
How to analyze service mesh
So, what can be done to perform application threat modeling in a microservices and service mesh environment?
One approach is to modularize the threat management process itself. Just as security teams would look in detail at component interactions from an attacker's point of view with a monolithic application or tightly bound components in a traditional application, so too would they for each service in a mesh. The difference is that, instead of focusing solely on one pathway through the application and the components used to support it, security pros must look also at each service independently. This is for two reasons:
- Because each service is nonstatic, meaning it can move or change context rapidly. For example, a service might be not accessible to the internet today -- i.e., being used only by other pieces of the application and not directly by end users -- and be exposed tomorrow.
- Because doing so enables security teams to keep up as developers release new or updated services on a continual basis.
Looked at through this lens, threat modeling becomes even more useful in a microservices context than it already was. Decomposing the application is fairly easy to do -- arguably easier than it would be otherwise -- because services are already loosely coupled. Likewise, for shops that are pushing DevOps or continuous integration/continuous delivery, analysis is no longer a bottleneck since they don't have to redo existing work when or if they release a new service.
To start threat modeling service mesh, begin by looking at each API in isolation, examining inputs, outputs and how they can be abused. Using STRIDE or another threat modeling methodology, examine and harden the service across each dimension. At the end of the process, there will be a small, hardened nugget of functionality about how an attacker might abuse the API, and resiliency will be increased.
It's important to bear in mind there are advantages in evaluating the overall application, as well as looking at individual services in isolation, meaning doing this additively rather than entirely replacing full application decomposition. This might sound strange at first since interactions between services are expected to change rapidly over time. They will but looking at the overall flow has advantages, too. For example, it can help find and catch business logic errors that can affect security -- for example, when input to one service is used and has ripple effects down the chain.
Threat modeling methods can and should be adapted for use in a microservices and service mesh architecture, even though service mesh changes the way that applications fit together. By looking at individual services in isolation, in addition to broader decomposition of the application overall, security pros can get better at accounting for changes in context, as well as streamline workloads.