Artificial intelligence data storage planning best practices
AI storage planning is similar to the storage planning you're used to: Consider capacity, IOPS and reliability requirements for source data and the application's database.
Advances in computing power, the sheer volume of data that is now available online and improved artificial intelligence algorithms have finally made AI practical. But how should you implement artificial intelligence data storage?
There is no one-size-fits-all answer for artificial intelligence data storage. Every AI application is different and so is the data that is associated with the application. As such, there are a number of different questions that you must consider when planning AI data storage.
What is the nature of the source data?
AI applications are dependent on source data; you must know where the source data resides and how the application uses it.
Suppose that a particular AI application is designed to make decisions based on the input received from a collection of industrial internet of things sensors. You must know whether or not the application treats the sensor data as transient. Can the application analyze the sensor data in near-real time as it arrives from the sensors, or does the application need to store the data and then analyze it?
If the application analyzes sensor data in real time, then you don't need to store that data (except in a temporary data cache). But if the application analyzes the data post-processing, then there are additional questions that you must answer before you design artificial intelligence data storage. For example, can the application purge the source data after it has been analyzed, or should you retain a copy so the software can occasionally reanalyze it? Either answer has implications for the volume of data that you must retain. You must also ensure that the storage back end can keep pace with the stream of new data that flows into the application.
How much data will the AI application generate?
An equally important consideration for artificial intelligence data storage is the volume of data that the application will produce. AI applications produce data of their own; they generally analyze the source data and then write the results of the analysis to a back-end database that the application's decision tree can use. It would not be practical for an AI application to parse multiple terabytes or even petabytes of data every time the software must make a decision. It is far more practical for the application to query a database of information that has already been parsed.
One of the defining characteristics of AI is that applications can make better decisions as they are exposed to more data. The application's database will grow over time, so you must monitor how quickly it grows and perform capacity planning accordingly.
How will you use the AI application?
You must consider how many people will use the application at a given moment and how quickly the application will need to deliver information to users.
Consider Cortana, Microsoft's AI-based personal digital assistant for Windows. Vast numbers of people could use Cortana simultaneously. Cortana accepts verbal input and responds verbally to questions, which means it requires an extremely high-performing storage back end. On the other hand, a lightweight AI-based business application that half a dozen people use might not require more than a single SSD. You must build a back-end storage system that meets the application's expected I/O requirements.