What are aggregate tables and aggregate fact tables?

Here you'll see examples and read definitions of aggregate tables and aggregate fact tables. You'll also learn how to use aggregate tables in data warehouses and databases.

In databases, what are aggregate tables and aggregate fact tables?

Aggregate tables, in general, are simply database tables that contain aggregated values. OK, I admit it: that answer is accurate but useless. So let's try again, and this time we'll use a fact table as an example. Imagine that you have a fact table like this in which the granularity is date, product and customer:

Customer ID Item No Order Date Unit Sales Profit
1 143 04/05/2012 1 1.52
1 150 04/05/2012 3 3.9
1 8 10/06/2012 1 2.48
1 77 10/06/2012 1 1.37
1 92 10/06/2012 1 1.33
1 95 10/06/2012 1 2.87
1 18 28/07/2012 1 2.3
1 37 28/07/2012 1 1.03
1 61 28/07/2012 1 2.01
1 83 28/07/2012 1 2.37
1 120 28/07/2012 1 2.24
1 8 13/08/2012 1 2.48
1 58 13/08/2012 1 2.79
1 100 16/12/2012 1 2.18
1 122 16/12/2012 1 2.47
1 83 12/05/2013 1 2.37
1 148 12/05/2013 1 2.17
1 37 29/11/2013 1 1.03
1 116 29/11/2013 1 2.03
2 46 07/04/2012 1 2.24
2 110 07/04/2012 1 2.84
2 49 06/05/2012 1 2.2
2 60 06/05/2012 1 1.91
2 26 22/05/2012 1 1.69

Every value that you see for "unit sales" is the number of units of a particular product that were sold to a particular customer on a particular day. (Here, to help make the point of aggregation slightly clearer, I've shown the day of sale as an actual date rather than as a pointer. And yes, I realize that these European-style dates are still in the future – but we'll treat them as the past for purposes of this demonstration.)

We could run queries against this fact table, and it would return data. For example, we could ask for the total unit sales of Item No. 150 to Customer 1 on May 4, 2012, and the answer we'd get is three.

We could also run a query that returns the total sales of Item No. 150 to the same customer not just for the 4th of May but for that entire month. In order to do this, the system would scan the fact table looking for 31 separate date entries listing sales of that product to Customer 1 and then aggregate the unit-sale values from the returned rows. That would be a relatively slow process.

An alternative is to create a fact table that already contains one or more levels of aggregation. For example, we could aggregate this fact table by month. That would involve finding all the sales of Item No. 150 to Customer 1 in, say, January 2012, aggregating the data, and putting the results in a single row. We then, of course, would have to do the same for the other months, the other products and the other customers.

The new table would look like this:

Customer ID Item No Order date Unit Sales Profit
1 150 Jan 2012 11 11.52
1 150 Feb 2012 3 3.9
1 150 March 2012 2 2.48
1 150 April 2012 1 1.37
1 150 May 2012 1 1.33
1 150 June 2012 2 2.87

Now when we had a query that looked for a monthly sales total, we could run it against the aggregate fact table and find the answer much more rapidly. And of course, this isn't the only aggregate fact table that we could generate. For example, if we knew the counties that different customers were located in, we could aggregate their data on that basis.

We could also aggregate by both date and customer up to the level of the month and county, as in this table excerpt (with Herefordshire being a county in England, for the uninitiated):

Customer ID Item No Order date Unit Sales Profit
Herefordshire 150 Jan 2012 111 121.52
Herefordshire 150 Feb 2010 63 73.9

In other words, we could create a series of aggregate fact tables, and when a query comes in, we could run it against the appropriate aggregated table.

By now, it is (hopefully) clear where this is all leading to: An aggregate fact table is simply one that combines multiple rows of data, loses some detail and aggregates numerical values.

Dig Deeper on Database management