Kesu - Fotolia
Dimension tables vs. fact tables: What's the difference?
Fact tables and dimension tables are used together in star schemas to support data analytics applications. But they play different roles and hold different types of data.
Entity relationship models used in business systems are typically organized to support efficient execution of transactions or operational events. Because the focus is on ensuring rapid response times, the associated data models don't easily lend themselves to the aggregation and slicing and dicing of data that drives business intelligence (BI), reporting and analytics applications.
As a result, transaction processing and operational data must be transformed into a framework that's better suited for analytical uses. The need to do so often leads to the implementation of fact and dimension tables in a data warehouse. Let's look at dimension tables vs. fact tables, and how they relate to and differ from one another.
Dimensional modeling is a common method of structuring BI and analytics data in the form of a star schema. A dimensional model enables data in a data warehouse to be represented in a way that's designed to provide fast performance on analytical queries.
The star schema structure enables speedy scanning of selection criteria among the available records in different data dimensions; that not only optimizes the generation of result sets, it also provides the ability to efficiently create aggregations, such as counts, sums, averages, and maximum and minimum values.
Fact tables get specific on data
At the heart of a star schema is a fact table, which holds data entries that comprise a set of facts related to a company's business operations. Each row in a fact table represents an individual transaction or event; the columns document the different data elements that come into play in processing the ones captured in the table.
For example, a fact table could contain the data associated with processing a sales transaction in a retail store:
Time ID |
Customer ID |
Quantity |
Item ID |
Promotion ID |
Unit price |
Total |
Location ID |
Clerk ID |
This table documents entity data, such as the product purchased, the customer making the purchase and the store location. It also includes quantifiable data, such as the number of units purchased and the total price paid for that product. Together, the fields in one of the fact table's rows record specific information about a particular product that was sold to a particular customer at a particular time in a particular store.
That's useful information, of course. But, by itself, it doesn't give the retailer's data and business analysts a lot to work with for analyzing sales. The quantifiable values captured in the fact table are amenable to being aggregated for analysis as is. To analyze the entity data records, though, dimension tables that hold more information about customers, products and so on are needed.
Dimension tables hold keys to analytics
In a fact table, the entries in the entity data fields aren't the actual data; instead, they're foreign keys that point to the primary keys for related entries in dimension tables, which capture a variety of information about each entity referenced in the fact table. A dimension table provides a uniform way to maintain an up-to-date version of the data associated with those entities.
For example, a customer dimension table created by the retailer would likely include the following data, tied to the customer IDs listed in the fact table:
Customer ID |
Customer name |
Street |
City |
State |
ZIP |
Home phone |
Loyalty ID |
Gender |
The dimension table could also contain many more data attributes, including additional demographic data, such as birth date; purchasing profile data, such as frequency of purchases and the brands bought; and favorite colors and other personal preferences provided by customers. One of the characteristics of dimension tables vs. fact tables is that the latter tend to be relatively narrow, with a limited number of columns, while the former are often very wide.
Maintaining separate sets of data on each customer, indexed by their customer IDs, makes it easy to run reports on, say, all of the purchases made by different people to help better target marketing campaigns. In addition, the retailer could use the combined data in the fact and dimension tables to analyze things such as which customers bought particular products and how many of them took advantage of promotions being offered.
Now, let's put the two types of tables together to illustrate how they work in tandem with one another.
Building a productive relationship
A visualization of the star schema that could result from our retail example is shown above, with the sales fact table linked to six dimension tables. Each of the dimension tables can be used in a similar fashion to the customer one.
The dimensional model also seamlessly supports data aggregation, querying and reporting across the different dimensions. To get a report about the sales of items by store location, a data analyst at the retailer can sort the fact table's sales records by location and then by product; at that point, the data on products sold at each store can be grouped together for analysis.
Time data can also be incorporated to analyze seasonal trends, or customer data can be analyzed to look at how shoppers differ from one region to another. With the right indexes put in place, analyzing and reporting on the resulting aggregate data sets is simple using SQL queries.
Dimensional modeling has become a standard approach in data warehousing because its elegance and simplicity enable rapid execution of analytical queries. The use of independent dimensions frees the dimensional model from any predisposed bias about the referenced entities. Also, it's eminently extensible: Adding new entities is as simple as adding records to the associated dimension table, and the dimensions in a star schema can be expanded by creating new dimension tables and putting keys that refer to them into a fact table.
In the end, it isn't really a question of using dimension tables vs. fact tables. The two go hand in hand to help organizations build data architectures that can support effective BI and analytics applications.