fact table
What is a fact table?
In data warehousing, a fact table is a database table in a dimensional model. The fact table stores quantitative information for analysis. The table lies at the center of the dimensional model, surrounded by multiple dimension tables. Each dimension table contains a set of related attributes that describe the facts in the fact table. Together with the dimensions, the fact table provides a consolidated view of the quantitative and attribute data, which is optimized for querying, reporting and analytics.
A fact table usually contains two types of columns: measures and foreign keys. The measures contain the quantitative data, usually in the form of numeric values. The measures represent key business metrics -- or facts -- about a process, event or condition. The fact table's primary key is usually a composite of the foreign key columns, although it can also be a surrogate key.
The foreign key columns join to multiple dimension tables and their attributes, making it possible to "slice and dice" the warehouse data by various attribute combinations to answer specific business questions. For example, key stakeholders in an organization might want to answer the question: "How many female customers in Utah and Nevada between ages 42 through 62 purchased hiking boots in May and June 2023?"
Data warehouse layout
A data warehouse is typically laid out in a star schema or snowflake schema, with the fact table at the center. A data warehouse can contain multiple fact tables, but each of those tables still lies at the center of its respective dimensions.
The descriptive attributes in the dimension tables make it possible to filter, categorize and summarize the facts to extract the answers to critical business questions. The following figure shows a simple star schema with four dimension tables and one fact table.
The fact table lies at the heart of this data model. It includes five measures (UnitPrice, SalesAmount, UnitsSold, PercentProfit and DailyInventory) as well as multiple foreign keys that reference the dimension tables. The foreign keys make it possible to access the dimensions' attributes, which contain information about the customers, products, territories and dates involved in each sale. The warehouse data can be filtered by any combination of attributes available in the dimension tables.
Four of the foreign keys reference the same date dimension, making it possible to slice the data by any of those dates. Some dimension tables, such as dimTerritory and dimProduct, contain hierarchical data, which can be broken down into more granular segments. For example, stakeholders might request a report that shows the total annual sales by product type and sales territory. Because of the hierarchical nature of the data, they'll also be able to drill down to view sales totals for individual products and countries.
Types of fact tables
Fact tables serve different purposes and operate at different grains. The grain determines what a record in the table should represent at its most atomic level. The table might include records at other grains, such as inventory totals, but the table's core purpose is to host records based on the defined grain. In the previous example, the fact table's grain can be specified as the "sale of a product to a customer at a specific time and place." A fact table's design should be based on the lowest possible and practical grain.
Fact tables are often categorized as one of the following types:
- Transactional. This is the most basic type of fact table and the most used. The table's grain is typically treated as one row per transaction or event. In the example, the factSales table is a transactional fact table. Each row records one sales transaction.
- Periodic snapshot. The fact table stores a snapshot of the dimensional data at a defined period of time, such as capturing weekly or monthly sales totals.
- Accumulating snapshot. The fact table stores snapshots of a process or activity that has a specific beginning and end. For example, this type of fact table might be used to track service calls or the order processing cycle.
Fact tables can also include different types of measures, which provide the quantitative data needed to find answers to specific business questions. A fact table supports three types of measures:
- Additive. The measure can be added across any dimension. In the previous example, the fact table's UnitsSold column can be summed up based on customer, product, date, territory or any combination of these. As a result, the column is considered an additive measure. Additive measures are the most common and easiest to work with.
- Non-additive. The measure cannot be added across any dimensions. In the example, the PercentProfit column stores the percentage of profit on each sale. If those percentages are added up across a dimension, you might end up with totals such as 26,000%. Because the percentages should not be added together, the column is considered a non-additive measure.
- Semi-additive. The measure can be added across some dimensions but not others. In the example, the DailyInventory column maintains a record of the number of products in stock on each day. Although it's possible to aggregate these totals for products and territories, it does not make sense to add them together for the dimDate dimension. For instance, if there were 25 widgets in stock on July 27, and 25 widgets on July 28, it does not mean that there are a total of 50 widgets in stock.
It is also possible to create fact tables that contain no measures. These tables, referred to as factless fact tables, are basically collections of foreign keys that join to dimension tables. Even without measures, this type of fact table can still provide useful information. For example, if the fact table in the example included no measures, the customers, products, territories and dates could still be linked together to provide a complete picture of all sales events, which could be filtered along any of the dimensions.
When designing fact tables and their related dimension tables, data architects should take into account both current and future needs, ensuring that the design is flexible enough to accommodate changing business requirements without having to rebuild the warehouse.
Evaluate data warehouse deployment options and use cases. Explore the differences between dimension tables vs. fact tables. Check out the pros and cons of on-premises vs. cloud data warehouses. Read about data integration challenges and how to overcome them and see how to develop an enterprise data strategy.