Knowing the right data enrichment techniques is crucial

In 'The Enrichment Game,' the latest book by 'Data Guy' Doug Needham, you'll learn why and how to cull data to gain valuable insights and a business advantage.

In his new book The Enrichment Game: A Story About Making Data Powerful (Sept. 2021, Technics), author Doug Needham, "The Data Guy," teaches the rules of the game that turns raw data into an important business asset. Here, read the introduction to this new book on how the right data enrichment techniques in the right hands can give you a business advantage.

Editor's note: The following is condensed from the original to better fit this format.

From the Uber app that manages your transportation, to the social media application that allows you to stay in touch with your distant friends and family, it's clear that software applications run the world. Each of these applications is continuously generating data.

To win the enrichment game, organizations must take that data and do something useful with it. Organizations must understand how all of this complex data can work together to answer questions, do research, provide insight, forecast the future and recommend new ways of doing business. To do this successfully, data from one application needs to be enriched with data from other applications. Those who can do this successfully can show data in ways that are unexpected and surprising.

The enrichment game has two goals. The first is to demonstrate the value of the data our organizations spend so much money to gather, protect and, in some cases, share. The second is to enrich the data available to a business by showing more context about the meaning of the data. Data strategists, data architects, data scientists, business analysts and business intelligence designers all play the game. Chief data officers oversee it, and the winner or loser of the game is the business itself.

First steps to understanding data enrichment

To understand the enrichment game, we need to know how data specialists differ from software developers in their approach to data.

Software developers create applications to meet the narrow and focused needs of those who need a product developed. A business or enterprise has a portfolio of multiple applications, each meeting the needs of its customers in some unique way. Each application has a constrained set of functions, just like the individual pieces in chess. But unlike in chess, if a new set of functions is needed, a new application can be created, or new components added to the existing application. When a software developer works with data, he or she needs the data to be available to a specific application for a specific purpose.

Book cover for 'The Enrichment Game'Learn more about this book
from Technics Publications

On the other hand, data specialists want to use all applications' data holistically to further the organization's broader goals. This requires data to be available to anyone who needs it within an organization, not just within the confines of a single application. We must take the data we have and make it better and easier to use. Data architects will design new platforms where the data can go. Data engineers will move the data from the applications to the enrichment platform. Knowledge workers like business analysts and data scientists will create new data products. We want to use data to solve problems not identified in the original specification of the software application. The data of an organization represents both the customer and the future. Using it poorly, not using it at all or creating data models that prevent future use cases that are known, hampers the progress that can be made with the data.

For this to happen, each application must share some subset of the data it manages. But rarely is application system data stored in a manner that is conducive to reporting and analytics. And when data specialists seek to gather that data to report on any application system, it tends to interfere with the performance of that application.

Thus, the goals of software development and data analytics may seem to be at odds. Therefore, it is always a best practice to separate operational systems from reporting systems. An operational system like an application is perfect for meeting the needs of our customers. A reporting system takes data from various applications along with some data not created within our applications and enriches the data to create further insights about what needs our customers have.

These two goals -- of compartmentalization and isolation for the application developer and openness and distribution for the data architect -- require a balance. Data moving from one application to another requires a governance strategy that allows for accurate data usage in both applications. When the same data is used by more than one application and for reporting, analysis and enrichment, the complexity increases.

The main strategy for this game, like many others, is that to win you must anticipate future needs. For example, there will be questions that an application's data cannot answer in isolation. The specific demographics of the various users of an application may not be something the application itself captures. Third-party data sources may need to answer the more interesting questions that come up about your users. Additional contextual information about how people use an application is always interesting to other departments like finance and marketing. In addition to building a new application, a separate platform collects the data the application collects. This separate platform is the enrichment platform.

Data analysts enrich the data available to a business by showing more context about the meaning of the data.

For instance, suppose there is a sudden drop in purchases in one of your stores. What context would help you understand the causes of that drop? Was there a flood? A snowstorm? Road construction that prevents people from driving by your store? Looking at the sales data in isolation can't answer those questions.

Likewise, do you have separate application systems or microservices that perform a dedicated function for your customers? Looking at data from just a single system can only answer a limited number of questions. Enrich that data with data from other systems, and you have a completely new perspective on both your data and your customer.

Playing the enrichment game

The enrichment game seems complex, but there are some simple things to keep in mind. Knowing the pieces, the rules, the players and the board will give you the tools to form a strategy that will win the game for you.

The game's goal is to make our data as valuable as possible to enhance the lives of our consumers, customers and the rest of the organization.

To win the enrichment game, organizations must take that data and do something useful with it.

Those of us who play this game passionately are a pleasure to work with. This is a simple game, where data from one application is enriched by data from another application or even enriched by external sources. The goal is to make the data we interact with more useful than it was before we were involved.

This is the essence of the enrichment game.

What is meant by the phrase enriching data? According to Lexico.com, the word enrichment means "the action of improving or enhancing the quality or value of something."

Like other assets, data can be enriched to add value to the organization as a whole. This book is about the methods, techniques and people involved while enriching data for an organization to use.

A software application written to add value to a consumer's life does not and cannot capture all of the data that will prove useful later. An application's performance will suffer if it stores all interaction data from the user for all time, so some weeding out of data must occur. The app does capture most of the necessary data, but questions will arise during the application's lifetime that the application itself cannot answer.

Some questions are simple: Is this a new user? How many interactions has the application had with this user? Is this a frequent user? These types of questions are relatively easy to answer as long as all of the right data is captured, such as timestamps for interactions.

Some questions are more difficult: What browser is the consumer using? What device is the user connecting from? In what ways is this user similar to other users? The answers to these questions must be found by enriching the data.

The process of enriching data makes simple data more thorough. This thorough data, by its nature, is both more interesting and more informative.

The sources of enriching a single application's data are limited only by the imagination. Some examples of other data sources that could be used to enrich data from a single application are:

  • application logs
  • other applications built by the company
  • third-party applications like Salesforce or customer relationship management (CRM) software
  • statistical population data for the user's zip code
  • social media data that the user may interact with
  • third-party data sources like credit rating agencies
  • other data brokers

Combining this data together makes each interaction the user has with a company part of a universe of data that knowledge workers can explore to look for patterns. This universe of data is called enriched data.

At a base level, knowledge workers can produce reports showing the various important metrics the company uses. Other knowledge workers, like data scientists, use this enriched data to identify new patterns, new use cases, and new opportunities.

The value of data enrichment

Once you change the way humans and machines learn from the data, you change how the data can be used.

Enriching data provides additional value by showing more contextual information around a particular event or transaction. However, the enriched data should be more useful than it was without the enrichment. Does knowing which phase the moon was in while someone bought a flashlight at their local supermarket have any predictive value? It might, if the reason for purchasing the flashlight was related to a power outage that recently occurred, and the person who bought the flashlight worked for a search and rescue operation. A full moon provides much more light available to a search and rescue operation than a new moon. While you may not anticipate search and rescue needs or even power outages if your store is the main supplier for the needs of a community, knowing the phase of the moon may be useful for having some items readily available and easy to find in your store.

One piece of additional contextual information in isolation may not be useful, but enriching data from multiple sources to get a detailed picture that indicates why someone made a purchase or used your software could be quite valuable in anticipating the needs of consumers.

Enriched contextual information about your data provides additional insight into the use of that data by your users.

Possible personal problems and how to avoid them

Many companies have a detailed idea of an ideal customer or customer persona for different situations. These customer personas were identified through survey data and optional questionnaires on the websites. Your company markets to certain personas. What are all the attributes you have identified for your ideal customer persona(s)? Is your ideal customer male or female? Are they a college student or an empty-nester? Do they live in a city, suburb or rural area? How do they use your products? Do they purchase items regularly, or do they only purchase items to prepare for a trip or an adventure? Does your application capture all of these attributes? How can you enrich the data you have to match data to your customer persona?

The difficulty I have seen with using these personas is that since a persona is an archetype of what a customer would look and act like, no actual purchases could be tied back to a customer persona. For example, at one company I worked, they had an ideal person for whom they created marketing material. She was a 30-something married professional mom of two children. Our application did not collect information on how many children our customers had. Also, we did not collect information on marital status or age. We could derive some of this information based on the purchase patterns, but the data in each application we were using only contained a portion of the persona information.

Relating purchase patterns, delivery addresses, items purchased, survey data, demographic data for the delivery location and other things got us closer to being able to say "persona 1 made these types of purchases" and "persona 2 made these other types of purchases."

Only by enriching the raw data from each application with data from our other supporting applications could we verify our persona assumptions and even tweak the persona definition based on usage patterns. No data from any individual application gave us enough visibility to the customers' needs to relate purchase patterns to our personas. Only the fully enriched set of data could begin to give us insights into our personas.

The enrichment platform creates a dedicated place for internal analysis and the opportunity to create new and additional data products derived from an application or group of applications that your business uses to interact with consumers.

Dig Deeper on Data science and analytics