michelangelus - Fotolia
Operational data store vs. data warehouse: How do they differ?
Operational data stores and data warehouses both store operational data, but the similarities between them end there -- and they both have a role to play in analytics architectures.
You could be forgiven for thinking that operational data stores and data warehouses are synonymous. After all, a data warehouse is a place where operational data is stored for analysis and reporting. Case closed -- two sides of the same coin, right?
Well, no, not so fast. There's more to the question of operational data store vs. data warehouse than that. Both do store operational data, but in different forms and for different purposes. And in many cases, organizations incorporate both into their analytics architectures.
The operational data store (ODS) is a bit harder to pin down because there are diverging views on exactly what it is and for what it's used. But, at heart, an ODS pulls together data from multiple transaction processing systems on a short-term basis, with frequent updates as new data is generated by the source systems. Operational data stores often serve as interim staging areas for data that's ultimately headed to a data warehouse or a big data platform for long-term storage.
Uses and benefits of an ODS
An ODS generally holds detailed transaction data that has yet to be consolidated, aggregated and transformed into consistent data sets for loading into a data warehouse. From a data integration standpoint, then, an ODS might only involve the first and third elements of the extract, transform and load (ETL) process typically used to pull data from operational systems and to harmonize it for analysis.
In that sense, an operational data store can be thought of as a funnel that takes in raw data from various source systems and helps facilitate the process of feeding business intelligence and analytics systems with more refined versions of that data. The full ETL process is handled downstream, which streamlines data transformation workloads and minimizes the processing pipelines needed between the ODS and the source systems to which it's connected.
However, some people also view the operational data store as a BI and analytics platform in its own right. Under that scenario, an ODS can be used to do near-real-time data analysis aimed at uncovering tactical insights that organizations can quickly apply to ongoing business operations -- for example, to increase retail inventories of popular products based on fresh sales data. By comparison, data warehouses typically support historical analysis of data accumulated over a longer period of time.
Depending on the specific application, an ODS that's used for data analysis might be updated multiple times daily, if not hourly or even more frequently. Real-time data integration tools, such as change data capture software, can be tapped to help enable such updates. In addition, some level of data cleansing and consistency checks might be applied in the ODS to help ensure that the analytics results are accurate.
ODS and data warehouse design
In weighing operational data store vs. data warehouse deployments, an ODS can potentially be built on a lighter data platform, especially if it's primarily being used as a temporary way station for data.
For example, an operational data store architecture might be based on the MySQL open source database or the cloud-based Amazon Simple Storage Service as an alternative to traditional data warehouse platforms such as Oracle, Microsoft SQL Server, IBM DB2 and Teradata. In big data environments, Hadoop clusters can provide an ODS staging area for feeding data to either a data warehouse or another cluster built on top of the open source distributed processing framework.
While data usually passes through an ODS relatively quickly to make room for new data coming up behind it, things are different in a data warehouse. The purpose there is to create an archive of data that can be analyzed to track business performance and identify operational trends in order to guide strategic decision-making by corporate and business executives.
A data warehouse might be updated frequently -- nightly, in some cases, weekly or monthly in others. But it's a more static environment than an ODS: Data is typically added, but not deleted, especially in the case of an enterprise data warehouse (EDW), which is designed to provide a single source of consolidated and cleansed data from all of a company's operations. EDWs tend to be large and complex platforms as a result -- a combination that can make deploying them a challenge.
ODS vs. dart mart
Another facet of the operational data store vs. data warehouse discussion is how an ODS compares to a data mart. Data marts are purpose-built data warehouse offshoots -- essentially, smaller warehouses that store data related to individual business units or specific subject areas. A data mart and an ODS might be in the same league on storage capacity, but otherwise, they differ in the same way that EDWs and operational data stores do. Like their bigger brethren, data marts are a repository for historical data that has been fully scrubbed and aggregated for analysis.
Two other things to keep in mind about operational data stores: First, they aren't the same thing as an operational database. The latter is the database built into a transaction system -- it's the location from which the data flowing into an ODS comes. Put another way, transaction data is initially processed in operational databases and then moved to an ODS to begin its analytics journey.
Second, operational data stores are sometimes equated with master data management (MDM) systems. MDM processes enable companies to create common sets of master data on customers, products and suppliers. The master data can then be fed back to transaction systems via an MDM hub, where the data is managed and stored. Early on, some organizations built MDM capabilities into ODS platforms, but that approach seems to have lessened in recent years -- perhaps partly due to the MDM market not growing like proponents hoped it would, itself a result of MDM's inherent complexities.