alphaspirit - Fotolia
5 steps to select the best IoT database
Organizations should begin their search for the right IoT database by understanding their data, functional requirements and how the database will fit into their business strategy.
To select the right IoT database, IT admins must first assess data types and data flow, and define their functional, performance and other business requirements.
The best IoT database must be able to address IoT-specific requirements. IT administrators have many considerations when selecting an IoT database, including scalability, fault tolerance, high availability and flexibility. They must also think about the location of the database -- on-premises or the cloud -- and whether it should be managed or unmanaged.
To help select a database, IoT technologists should take a step-by-step approach to ensure an IoT database meets their organization's needs.
How to select the right IoT database
Working through these five steps, IoT technologists should expect to narrow down the number of databases that need to be integrated with each other as well as legacy systems.
1. Assess the data types that the database will store and manage. IoT data types are as varied as the use cases themselves, but they can fall into several categories, including:
- Device metadata. This might include the device ID, a unique identifier for the physical device; the device class or type; the device date of manufacture; hardware serial number; and current configuration or version. This data is relatively static.
- Device state information. This includes the various relevant states for that device, such as on or off, active or passive, or recording. This data can be dynamic.
- Telemetry data. The data that the device collects -- assuming it is a sensor or device whose primary function is to collect data -- typically arrives as streaming data that changes every unit and may be organized into channels.
- Command data. This data controls the actuator or device to take action, such as rotate left or speed up.
- Operational data. Data about the operation of the device itself, including CPU usage, memory usage or heat.
Many IoT neophytes make the mistake of focusing on the command and telemetry data, which informs the business process. This focus comes at the expense of the management data which, includes device data, state data and operational data. However, the management data is critical when applied to digital twins, a digital mirror of the physical IoT environment, or recreating unexpected failure modes or conduct forensics.
2. Map out the data flow. IoT leaders must identify where different types of data are collected, aggregated, analyzed and transformed, as well as how the data integrates into other systems. Will the data need to be enriched and at what points will it need to be captured and logged? Make sure to identify areas of data storage and replication. Will there be a canonical data store? Plan where, when and under what circumstances data is archived.
3. Map database needs to functional requirements. After an IoT technologist has defined the types of data and data flow, the next step is to map database needs to functional requirements, including:
- Data ingestion and aggregation. After collecting and aggregating data from the devices, the data often is processed at high speed, particularly if the telemetry and command data comes in at high-speed streams. This type of data requires high-performance reading for telemetry and writing for command data, along with high reliability and availability.
- Edge analytics. Many data flow architectures include edge analytics relatively close to the devices themselves. Data requirements include data translation, filtering, enriching and any additional aggregation. Edge analytics databases need high-speed read and write functionality and very low latency, plus the ability to support analytics tools and solutions.
- Core analytics. As data is aggregated further, possibly in a cloud-based core, it may need to undergo additional transformation, enriching and analytics. A core analytics database platform requires high availability. It may also need to be distributed and support streaming analytics.
- The management console needs to capture and display device data, including metadata, operational data and state data. It should include visualization and dashboarding capabilities and requires submillisecond latency.
- Business analytics. Data from IoT networks often needs to integrate into larger data lakes where data scientists can run analytics and AI. The IoT database needs to integrate with the enterprise's existing business analytics or solid data warehousing and analysis.
4. Determine database performance requirements from functional requirements. In a nutshell, databases generally feature a tradeoff between performance (the read and write response time) and longevity (the length of time the data must reside and be kept current). Another way to look at this is speed vs. scale. Ingestion and edge analytics need very low latency and high-performance speed but don't typically need to keep large volumes of data for an extended period of time. Business analytics databases in contrast need to keep large volumes of data for months, years or decades, but don't need submillisecond response time. This functional disparity drives the need for multiple integrated IoT databases, rather than a single database type.
5. Apply additional business requirements. Performance isn't the only requirement. Other factors include how providers are pricing their services with licensing fees, the location of the database, the organization's stance on using open source tools and resources, and the legacy environment that will integrate with the IoT database.