Managing storage for IoT data at the enterprise edge

Edge computing, the cloud and internet of things data are changing the way businesses store, analyze and manage all the information generated at the perimeter of the enterprise.

Storage at the edge or the outer limit of the enterprise has garnered little attention from IT or vendors.

Centralization has always been the dominant paradigm, and what storage existed at the enterprise edge, perhaps in a small server in a remote office or in a corporate laptop, only mattered because it contributed to the main corporate data trove. Perhaps the biggest focus on the edge has been the additional security risks it carries.

Today, as the cloud has upended long-standing architectural ideals, and as the edge itself has grown in importance, particularly in connection with IoT and the industrial internet of things (IloT), what was once terra incognita has started to matter. IoT, and especially IIoT, are becoming sources of vast, growing quantities of data, and some -- but not all -- of it could be quite valuable. Storage for IoT and IIoT data is becoming a point of attention.

For example, thousands of video cameras generating vast amounts of data keep many cities safe. In other areas, prodigious volumes of data come from heavily instrumented locomotives, aircraft and even agricultural operations where soil sensors are helping farmers tune fertilizer and pesticide applications and irrigate with precision. All that data can easily overwhelm traditional centralized storage models, and that has led to experiments with preprocessing and analysis at the source and centralizing only a small part of the data.

A lot of the data at the enterprise edge isn't worth much, however. It proves nothing. It's the outlier data that matters most and must be extracted.

A lot of the data at the enterprise edge isn't worth much, however. It proves nothing. It's the outlier data that matters most and must be extracted.

"Normally, we think of anything that isn't the data center as the edge, whether it's remote office, branch office or IoT," said Greg Schulz, principal at StorageIO, a technology consulting firm. "What if we think of the classic data center as the edge and the cloud as the core? If you have multiple distributed edge sites, you can do a peer-type backup where one remote protects itself to another, or they could all protect back to the main data center or the cloud."

Schulz's point: Edge computing, the cloud and IoT invite a rethinking of storage.

Rethinking data storage for IoT

"It's true that with the increased adoption and use cases for IoT, edge computing will gain prominence," said Naveen Chhabra, a Forrester Research analyst.

However, most of the time, the data these IoT devices generate is transient. After analyzing it and deriving insights, only the insights will need to be stored. Fortunately, retaining the insights and not the raw data won't be too storage-intensive, Chhabra said.

"Firms haven't been backing up much data generated in off-site locations, such as SOHO [small office home office] or plant operations," he said.

Edge computing

Also, arguably, such data isn't business-critical. Likewise, the status quo is valid for disaster recovery plans, Chhabra said.

"With all of these buzzwords -- IoT, IIoT and IoD [internet of things devices], you will definitely need to see more storage at the endpoint," Schulz said. This is true even if the data only lives there for a short time.

Schulz draws a distinction between storage -- particularly storage for IoT applications, at the enterprise edge, connected directly to sensors or onboard a train -- and storage supporting the edge, which could be some distance away, but still far closer than, say, a data center.

"Storage at the edge continues to grow in a lot of ways, both for video surveillance, telemetry and actual applications that are being pushed to the edge, such as clickstream analytics happening in near real time," he said.

Nor is it a matter of one or the other. Companies will determine where the data is stored on a case-by-case basis, according to Schultz, but data will probably spend some time both at or near the enterprise edge and at the core.

"You may do a quick look at data at the edge or ship it back for more traditional repository analysis, and, in many cases, you will probably do both," Schulz said.

In the case of video surveillance data, it might get pre-tagged, and, as time permits, it can flow inward for deeper, frame-by-frame analytics.

The problem of moving all that data

For the most data-rich IoT or IIoT applications, data movement is still a physical process, Schulz said.

"Think about the amount of data generated for offshore drilling; those companies are certainly collecting terabytes of data at the edge, if not petabytes, and some of that can be processed on site, but some has to be shipped physically to a data center," he said. "The old approach was tape or putting it on disk and giving it to FedEx."

But now, there are alternatives, such as Snowball from AWS, which was one of the storage appliances made available by a cloud provider.

SD cards at the edge

Storage that exists at the enterprise edge or within internet of things devices varies widely. Micron Technology Inc., a semiconductor company, has come up with an approach that works for everyone -- an industrial-strength SD card, which was described in a recent case study.

Repon, a Taiwanese ball bearing slide manufacturer, decided to take an edge approach to deploying a security and surveillance system at a manufacturing facility. To be useful for security, the system needed a degree of storage redundancy to ensure 100% coverage, even if the networks or central storage failed.

Systems integrator Apogear adopted Micron's microSD cards, adding them to individual cameras to complete Repon's system design. The thinking was that if anything happened to primary network storage, even briefly, the video could still be preserved on the microSD card. When a situation is resolved, the recorded video can be synchronized to network video recorders or video management systems, according to the companies.

Micron says its microSD cards can support three years of continuous video recording with a 2 million-hour mean time to failure.

In addition, some IoT implementations include devices with large internal, direct-access storage, particularly in edge servers or clusters of servers. There are also IoT devices with a small amount of storage built in, and there are even companies harnessing SD cards to provide extra storage on the device or at the enterprise edge.

Beyond simply the need for data movement and data storage for IoT applications, there's also a need to recognize that it won't all be garden-variety NFS protocol data. This is true particularly when it comes to IIoT, which tends to generate far more diverse storage needs. Some might look for iSCSI or even Portable Operating System Interface files systems, Schulz said.

But don't obsess over the potential quantity of data, said Vernon Turner, principal and chief strategist at Causeway Connections, a market research company.

"Everyone thinks that IoT data equates to storage when, in actuality, it relates to analytics," he said. Although IoT-generated data has value and storage for IoT data is important, "creating data lakes isn't what IoT lines of business want," he said. "They certainly don't want isolated data lakes."

What organizations want is an open data platform that ingests variable data sources and formats it to help them make contextual decisions.

Does this sound like a golden opportunity? Turner thinks so, but the players likely won't be traditional storage companies.

Delivering the needed analytics

Connecting devices to the network is generally regarded as a good thing, and now that data is available on objects and assets that couldn't be obtained previously, it opens up an array of service opportunities for everyone, Turner said. Carriers, service providers, network providers, cloud providers, data management software companies and, of course, storage hardware vendors could each claim a share of the pie. But whether they will is another matter.

Data carriers accustomed to moving large amounts of data, for instance, might find it tough to get out of the pipe business, Turner said.

"They think that their data source from the likes of 5G-enabled apps is their savior," he said.

Fog: A fresh view of computing at the enterprise edge

Fog computing is intended to bridge the cloud-to-things continuum. It brings computer-controlled networking storage closer to where data is generated at the edge of operations.

"We think of fog as the necessary architecture for IIoT or IoT," said Lynne F. Canavan, vice president of marketing at the OpenFog Consortium.

Her reasoning is that there's simply too much data trying to get to the cloud via too few pathways.

"As a result, there's latency and high networking costs. And data that requires a few more milliseconds to arrive can lose its value," she said.

"Fog is cloud computing that is, so to speak, low to the ground. It's a horizontal architecture that expands the continuum between end devices and the cloud to make both more operationally effective."

Fog computing bridges cloud-to-things continuum
Fog computing brings networking storage closer to where the data is generated.

Likewise, traditional storage hardware vendors may not be ready to deliver the needed intermediate analytic capabilities because they're focused on selling boxes.

However, Phil Goodwin, research director for cloud data management and protection at IDC, said the challenges of IoT, storage for IoT data and computing at the enterprise edge will ultimately be solved by moving toward software-defined environments, where no one player has to create much that's .

"Rather than having specific arrays or devices or applications, there will be much more that's software-defined with the physical infrastructure below it," he said.

Setting a direction

As the next scenario unfolds, there will be lots of work to do.

"In many cases, the IoT devices or their purchase is driven by business unit requirements, and those business leaders are rarely IT savvy," Goodwin said. Thus, data protection and data management considerations can become an afterthought that IT operations must address.

Fortunately, from a technology standpoint, there isn't much innovation required to solve most of those challenges, Chhabra noted. Existing backup and disaster recovery technology can cover remote activities, including IoT.

"The only requirement that I see is the increased scale as more and more applications generate data," he said. Efforts to address that scale are coming from vendors in the hyper-converged infrastructure space, along with the backup tool vendors that "are stretching themselves into this space."

For Schulz, it's a familiar pattern. "There are big waves we go through periodically from distributed to centralized and back and forth. This is just one more of them."

Dig Deeper on Storage system and application software