How data staging helped Walgreens transform its supply chain

Walgreens built a centralized data warehouse to give supply chain partners a better view into its data -- but analytics were slow. That's where a data staging tier came in.

Drugstore giant Walgreens made a considerable investment in building a data warehouse to consolidate data from different systems. But its supply chain team still struggled to generate accurate and timely reports. The data warehouse also didn't provide much visibility to analysts at the company's supply chain partners, which made it difficult for them to plan their own production to meet the needs of Walgreens.

At the 2019 Strata Data Conference in San Francisco, Walgreens executives discussed how they developed a data staging tier using Kyvos from Kyvos Insights at a fraction of the cost of online supply chain analytics platforms. This made it possible to provide customized analytics across the organization and to partners. It also helped ensure that everyone was operating on the same data sets.

Big data = big shortcuts

Walgreens' teams of analysts and data scientists were struggling with generating accurate reports for forecasting, tracking out-of-stock items and doing sales and operations planning for its 9,500 stores across the country. "We wanted to manage inventory in partnership with suppliers," said Anne Cruz, IT manager of supply chain at Walgreens. "But this is only possible if we can provide the data they need to provide correct and accurate inventory at the store."

Keeping the shelves stocked was important to keep customers and store workers happy. Accurate data was also important for sales and operations planning and for making adjustments in marketing or strategy when sales shifted.

If you don't have agility, it is frustrating for users.
Anne CruzIT manager of supply chain, Walgreens

The company's IT operations team receives about 70 billion transaction records per day. This created a tremendous data engineering challenge, even with a Hadoop cluster for aggregating the data. But the analysts using the data spent long periods loading the appropriate data sets into tools like Tableau, SAS and Excel. "If you don't have agility, it is frustrating for users," Cruz said.

As a result, every analyst would find various shortcuts for generating reports that sort of worked. Many of them manually curated their own data sets, which could result in mistakes or didn't reflect updates in the system of record. Other times, a simple analytics process would take hours to execute.

First, a data warehouse; then, data staging

Part of the problem was that the data would be aggregated into different buckets for a given product, said Neerav Jain, a technical architect at Walgreens. Each Walgreens store carries about 30,000 products, and different data about the products was spread across more than 50 tables and sources.

It didn't help that the data was also spread across mainframes, AS/400 servers, relational database management systems, in-store servers and the odd Excel spreadsheet. Walgreens wanted to create a centralized view of all the supply chain data and decided to start with five scorecards representing different views.

It was also important to provide each vendor with a secure view of the data relevant to its own products. Jain said: "We have thousands of vendors, and one vendor cannot see data for another, or we can be sued."

About two years ago, Jain's team started collecting data from the various sources into a centralized data warehouse in Hadoop with a plan to build an integrated data hub. Although the data was in one place, the analytics were still slow.

So, the Walgreens team put out a request for proposal to build a data staging tier and shortlisted solutions from five different providers. Ultimately, the team selected Kyvos to help build a data staging and integration tier on top of the Hadoop cluster. Cruz said it ended up being about one-fifth the cost of competing in-memory solutions. The data staging tier helped reduce the processing time and enabled the analytics teams to create highly responsive dashboards.

The key enhancement is that Kyvos automatically generates new data cubes and associated indexes on a daily and weekly basis in the background. It takes about 30 minutes to generate the daily updates and about 2.5 hours for the weekly updates. Once these are complete, they are copied to the staging tier in a way that doesn't interfere with ongoing analytics jobs.

Addressing data gaps

The data engineering team then works with business analysts to identify the kind of analytics they need and current business problems. Gaining this understanding helps the data engineers guide the best ways to structure the dimensions required for generating the best data cube in the consumption tier.

The engineers also had to find ways to address gaps in the data that might otherwise affect supply chain analytics. For example, if a vendor suffered a manufacturing or distribution challenge, the data warehouse wouldn't capture any data about its products during this outage. However, analysts might prefer to ignore this period since it could skew their analysis or predictions. At the same time, Walgreens doesn't just want to fill in data in the system of record, which could skew financial reports.

So, Jain's team worked with the analysts and data scientists to develop several schemas to automatically format the data for different analytics processes.

Before building anything, organizations should confirm with business users that they have the right data, Cruz said. When new data comes in, her team also revalidates whether they should add it to their data models.

Dig Deeper on Data warehousing