The unique requirements for the edge data layer

ClearSky Data

The data layer is an often invisible, but essential foundation for any new infrastructure technology, but it’s frequently an afterthought early in a new technology’s development. For example, when containers were first rolled out, the lack of persistent storage severely restricted their use until vendors provided workarounds to solve the problem. We can’t afford to make that mistake again.

There’s an urgent need to store, manage and protect the enormous amount of IoT data being generated at the edge and the exponentially higher volumes expected soon.

A look at the explosive growth in data underscores the urgency. In 2013, IoT produced 0.1 zettabyte (ZB), 1 million petabytes (PB) of data, and that’s projected to hit 4.4 ZB by 2020. By 2025, it will skyrocket to an estimated 79 ZB, according to IDC — and that’s just for IoT. Enterprises will also be using the edge for infrastructure services, and service providers will be using it to deliver over-the-top video among other things. There is no limit to the types of data that our world will generate that will contribute to this dramatic IoT growth.

Using the edge to overcome latency

Certainly, much of that data can reside permanently in the limitless, robust repository of the cloud, but only if performance isn’t a priority. Cloud data centers are built in low-cost areas far from urban centers. That’s economical, but the distance introduces significant latency, a no-go for any app demanding instant response. With the number of endpoints expanding at an almost limitless pace, the latency challenges are exacerbated. Consider healthcare apps involving life-or-death situations, such as IoT devices communicating in real time, self-driving cars that make critical decisions second to second and financial dealings transacted in milliseconds.

The answer is to ensure compute and data are kept close to the end user; in other words, the answer lies at the edge. But when it comes to edge data, there are some serious challenges. Edge data centers must be built near large metro areas, where real estate prices are sky-high, and because of the cost, facilities are going to be small — too small to fit enough traditional storage boxes to accommodate all that data.

Distributed storage is another issue. Applications at the edge won’t interact with only one facility, for example, autonomous vehicles. As a self-driving car travels in and out of the range of different facilities, it will need to communicate seamlessly with all of them, and its data must follow along. That’s a tall order with just one vehicle. What happens when millions of autonomous vehicles hit the roads?

The solution is to pair the cloud with the edge and to provide a way to intelligently move data between them, minimizing latency and maximizing performance. In this way, the edge data layer acts as an on-demand service, and not as a massive rack of big storage arrays.

One way to limit the need for space and to avoid wasting resources is to cache the most active data at an edge data center. Better known as “hot” data, this tier typically accounts for only 10% of the data set. If data is “warm,” it can be stored at a point of presence far enough away so that real estate isn’t cost prohibitive, but close enough to provide no more than a couple of milliseconds of latency. Generally, anything under 120 miles away is close enough. The rest of the data set can be stored in the cloud if it’s cold and accessed infrequently.

A service model for the edge data layer

With this approach, 100 TB of local storage in an edge facility can represent 1 PB of usable storage. The full data set, including cold data, is ultimately stored in a backing cloud. Storing both hot and warm data in a nearby point of presence can ensure sub-millisecond latency. Keeping data close to the decision point enhances processing power, so analytics can be performed with minimal latency.

This model can also help address the power constraints that many edge facilities face. Because just a fraction of the full data set is stored onsite, not only is there far less gear consuming electricity, but it makes it financially feasible to use the latest solid-state storage, which offers better performance and energy efficiency than spinning disks.

Having data stored where it needs to be enhances both distribution and connectivity. The use of dedicated private lines, which have become more affordable thanks to declining bandwidth prices, can improve both performance and security. On the edge, uploads happen at the same speed as downloads, a major consideration for apps that produce massive amounts of data to be uploaded and processed.

It’s still early in the edge buildout, but the explosion of IoT requires the industry to implement a data layer now in order to ensure the edge can handle the rapidly growing mountains of data. And it will require a new approach that differs from both traditional on-premises and cloud storage. In the end, only a service model can successfully address the space and energy constraints of the edge, while also providing sufficient capacity and data distribution capabilities to power the future of IoT.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.