Sergey Nivens - Fotolia

Sensu architecture smoothes monitoring data workflow for NCR

Sensu Go's architecture and monitoring as code prompted NCR to switch away from Zabbix and pass on SolarWinds, as it sought to collect data from 80,000 restaurant customers.

An enterprise IT shop with over 80,000 locations to manage found the level of monitoring and control it needed in the overhauled Sensu architecture.

Sensu, an open source monitoring tool that specializes in event data collection, had been on the radar for a few years at Atlanta-based NCR Corp., which makes point-of-sale (POS) systems, self-service kiosks, ATMs and other retail data processing systems. Sensu's monitoring-as-code approach, which allows developers to directly automate and customize the configuration and delivery of monitoring agents, appealed to NCR's DevOps teams, but the tool was too unwieldy to set up and operate. That changed when the Sensu Go architecture rolled out in December 2018.

"The challenge was that Sensu was so Ruby-focused and took a lot of infrastructure expertise in Redis to set up the right way," said Michael Hedgpeth, software director at NCR. The company also used Chef, an infrastructure-as-code tool written in Ruby in its early versions, but found Sensu much more difficult to learn.

When Sensu Go launched, it represented a complete rewrite of the Sensu app in the Go programming language, with a focus on simplicity and accessibility, along with an architectural revamp that eliminated the requirement to set up separate instances of Redis and RabbitMQ for in-memory data management and application messaging queues. The new Sensu architecture includes an embedded instance of etcd that handles both, the company said.

Sensu Go simplifies monitoring as code

The Sensu architecture update arrived as NCR sought a monitoring tool suitable for some 80,000 restaurant POS systems that belonged to a customer. NCR needed a lightweight agent that could be deployed and reconfigured by developers automatically, as well as an easy means to extract POS event data for processing with Google Cloud Platform data analytics services. It previously used a Nagios-based monitoring tool from Zabbix and considered tools from SolarWinds. But with the updated Sensu architecture in place, the monitoring-as-code approach won out.

Monitoring as code gives developers more control over the observability of their apps and integrates into NCR's CI/CD process. NCR stores Sensu agent configuration files in JSON format inside a Git repository. Developers determine which monitoring checks run in each instance of an application, and the configuration files and their settings flow to restaurant endpoints according to policy, without manual setup.

Michael Hedgpeth, software engineering director at NCRMichael Hedgpeth

"[With other monitoring software,] you don't have as much control over what the agents are doing, so that's a part of it. And another part of it is cost. We need an open source [product] to get to the scale that we're talking about," Hedgpeth said. "But maybe the biggest issue is the complexity of the problem -- we have roughly 80,000 data centers."

For most of those endpoints, generic monitoring is good, Hedgpeth added. But restaurant customers with their own IT department sometimes need a standardized way to interact with the monitoring system, and the monitoring-as-code approach means developers that work for the restaurant company can add on to NCR's version of Sensu agents.

"I'm not convinced anything UI-driven would be able to handle that level of complexity and scale," he said.

In the past, the support model for restaurant customers had been log data shipments at regular intervals, but using Sensu as a routing engine will allow for an automated flow of data into a Google data lake for analytics that can optimize the performance of systems NCR doesn't fully control. Sensu's agents push event data to back-end systems by default, rather than pulling it in through communications between endpoints and a central data collection system. This reduces strain on WAN bandwidth and doesn't require firewall ports to be opened at customer sites for data collection access.

Sensu architecture
The recently overhauled Sensu Go architecture

Sensu architecture has room to improve

It's the Ferrari of monitoring, but you really have to start with the pieces and build your own car.
Michael Hedgpethsoftware director at NCR

NCR hasn't finished its Sensu rollout, and not every DevOps team has a monitoring-as-code process established yet, Hedgpeth said, though NCR plans to use Sensu everywhere long term. Sensu was also a hard sell for Hedgpeth internally, because it requires a separate graphing and visualization system to display the data it collects.

"It's the Ferrari of monitoring, but you really have to start with the pieces and build your own car," Hedgpeth said. "If they want their community to grow, they need to be able to communicate why the product is good to business managers in a conference room with pretty graphs, at least enough to do a demo."

As an open source monitoring tool, Sensu has been around since 2011. But its commercial backer, Sensu Inc., which emerged a year later, only began to focus fully on Sensu development as an enterprise commercial product in 2016. The company remains in the early stages of collecting paying enterprise monitoring customers, of which it has about 100 so far.

Since the December 2018 release of Sensu Go, an incremental release in February 2019 added enterprise features, such as integration with ServiceNow help desk ticketing software, LDAP identity management and Atlassian's Jira team collaboration system. Graphing support, granular role-based access control and namespace-based multi-tenancy support for companies such as NCR that use Sensu to manage customer data remain on the Sensu Go roadmap for 2019, said Caleb Hailey, Sensu Inc.'s co-founder and CEO.

Dig Deeper on IT systems management and monitoring