Sergey Nivens - Fotolia

Tips for creating curated data sets for self-service BI users

Data curation initiatives can help streamline BI processes by reducing the amount of time users spend locating and preparing data. Get four tips for preparing data sets.

A key part of any self-service BI strategy lies in presenting data to users in a digestible format. In many cases, that takes the form of curated data sets created for them by BI and analytics teams.

Business users often aren't equipped to find, integrate and prepare data for analysis on their own; they just want to access and analyze it as part of business operations. Curating data for them can simplify and streamline the BI process, and it may be the only realistic approach in large organizations with thousands of end users who rely on BI data to help drive business decisions. But data curation initiatives can easily go awry if they aren't planned and managed properly.

A two-pronged approach should be taken when creating curated data sets for self-service BI users, said Rob Perry, vice president of product marketing at ASG Technologies, an IT service management provider. Organizations must begin by setting the groundwork -- defining goals, matching data and audiences, and instituting a curation workflow that makes sense for their data and audience. They must be careful to choose their data curation approaches wisely, as well as to provide varied types of data to ensure a holistic view. Expiration dates should also be set to ensure that information is current and timely.

Once the groundwork is set, the focus can shift to using the data sets in a way that adds value and enabling those creating the data sets to think critically about the information they are accessing and to be selective about the quality of information they're incorporating. "While not everything needs to be perfect, it's important that nothing in the data set is a mystery," Perry said.

According to experts, organizations creating curated data sets must focus on integrating data from existing business systems, curating views of the data, shifting curation to users and taking a holistic view of the data lifecycle.

Start with data in existing systems

Most organizations have many business systems. It's important to figure out which bridge to build first when creating curated data sets for self-service BI users. Once the first bridge is built, an organization can move onto the next one and so on and so forth until it has a completely interconnected set of business systems.

Matthew Meigs, director of marketing communications at content management platform Nuxeo, said some good questions to guide this exploration include:

  • Which business systems are most frequently used?
  • Which system stores information that could be of benefit to more users than can currently access it?
  • Which system costs the most to manage and maintain?
  • Which existing system could benefit from access to mobile, cloud or other new technologies?

Curate views, too

Most self-service BI users are not experts in data, particularly when it comes from other departments. One good strategy is to find tools to create and organize views of data sets that can be pulled up by users on demand. This involves capturing all the data that is feasible, then extracting and loading it into data stores. The curation team can then focus on creating elegant transformation for the data into views that are organized into subject-specific data marts.

"Views become the holy grail of insight incubation," said Dave McCandless, vice president of IT at Navis, a shipping supply chain software provider. Don't let the data curators be the ones to decide what's useful and stifle requests for out-of-the-box thinking. Data curators need to be ultrafocused on maximizing the wealth of data available to the organization and not be filters on what of the requested data is of value, he said.

Shift curation to users

If feasible, one of the best practices is to actually employ your BI users to curate their own data, said Sean Kandel, co-founder and CTO at Trifacta, a data wrangling tools provider. This shifts the burden of data quality to the users and can be more efficient for some organizations, as a small data curation task force is replaced with a much larger team of business users who have a better understanding of the data and their requirements.

Because the line-of-business people know their data best, they should help contribute to the creation of the curated data sets.
Isabelle Nuagedirector of product marketing for big data, Talend

The IT department may still step in to curate the best work for broader consumption. But business users can take greater responsibly for deciding what's acceptable, what needs refining and when to move on to analysis within the context of their department.

"Because the line-of-business people know their data best, they should help contribute to the creation of the curated data sets and the overall data governance," said Isabelle Nuage, director of product marketing for big data at Talend.

Manage the data lifecycle

Data stewardship is the process of managing the lifecycle of data, from curation to retirement. This involves defining and maintaining data models, documenting the data, cleansing the data and defining the rules and policies. It enables the implementation of well-defined data governance processes covering several activities, including monitoring, reconciliation, refining, deduplication, cleansing and aggregation, to help deliver quality data to applications and end users.

In the long run, IT managers have an opportunity to create a place where everybody can go and find their data, all while enabling them to collaborate by sharing comments, warnings, annotations and more. A well-organized data catalog can create the same experience in the enterprise for self-service BI users of curated data sets that Wikipedia has achieved for knowledge sharing.

Next Steps

Best practices for good BI dashboard design

8 business intelligence trends to prepare for 2021

Business intelligence books to read this year

Dig Deeper on Business intelligence management