What is the difference between a logical and physical warehouse design?
What's the difference between logical design and physical design? Get an expert's take, plus learn about three data warehouse models -- the user model, physical model and logical model -- and how they differ.
What is the difference between a physical and logical design?
The process of database design is typically separated out into three different layers:
- User
- Logical
- Physical
The idea of splitting it like this arose very early in the development of databases. These three layers were first described in an interim paper published by the ANSI/SPARC Study Group on Data Base Management Systems in 1975.
It is certainly not essential to remember that ANSI stands for American National Standards Institute and SPARC for Standards Planning and Requirements Committee. The committee realized that the fundamental problem with database design at the time was a lack of communication.
Users who want a database often have a model in their brains of what they want.
Users tend not to think about databases in a formal sense; rather they tend to think in terms of the information that they want to appear on screen in order to allow them to complete their work. "I want to be able to enter details about all the products I have to sell." They also think in terms of the functionality that they want. "I also want to be able to manage the orders that customers place with me."
Then there are the database designers (DBDs) who essentially think in terms of database structures. Relational database designers tend to think in terms of tables, columns (fields), rows (records), primary keys, referential integrity, clustered and non-clustered indexes.
The problem comes when these two talk about the database. There is exactly and precisely zero common ground between them. The following (admittedly imaginary) conversation sums up the problem.
Client: "Hi, we need a database to store information about our real estate business."
DBD: "Great, what sort of tables did you have in mind?"
Client: "Uh, no, not the content of the houses, just the property itself."
DBD: "Do you want fields with that?"
Client: "No, not all houses come with land. But the new system does need to tell us which houses are on the property index."
DBD: "Clustered or non-clustered?"
Both groups have a perfectly valid model in their heads of the proposed database. The user's model is expressed in terms of the business functionality that they want; the DBD's model is expressed in terms of the way in which the database should be physically constructed. The former is very business oriented and relatively un-formalized, the latter is very structural and extremely formal.
So, we may have a communication problem but at least we now understand the problem and defining these two models, the user model and the physical model, is a large step on the way to solving it.
It would be difficult to imagine two groups with more different views of the same database and yet it is these two very groups that have to work together to produce a database. This was essentially the problem that the ANSI committee identified. The solution it proposed was inspired. Rather than attempt to make either group modify its view, which would have been disastrous since both views are perfectly valid, it proposed the introduction of a third model that sits between the two and acts as an interpretation layer. This is called the logical model (or 'conceptual schema' in ANSI/SPARC-speak).
The logical model – overview
The logical model concentrates on formalizing the user's view of the database, turning it from a relatively unstructured state into a definitive description of the user's requirements. Once this has been done, it is relatively easy to map this onto the physical model that the DBD is so keen to produce. Logical models can be constructed in a variety of ways, but one of the most commonly used is called entity relationship (ER) modeling. The ER model gets its name from the fact that it records the entities that are identifiable in the user's requirements and the relationships that exist between them.
During a process called business requirement analysis, business analysts (BAs) talk to the users and examine the user model that exists in the users' heads. With the co-operation of the users, this is formalized into an ER model which essentially forms the logical model.
It is important to realize that this logical model is based entirely upon the users' requirements. There is no input from the DBDs. Indeed, at this stage, not only has it not been necessary to decide which database engine the database will run upon (DB2, SQL Server etc.), it hasn't even been necessary to chose a database model (relational, hierarchical etc.).
Once the logical model is complete, it is handed over to the DBDs. At this point a decision is made about the database model and (most commonly) the database engine that will be used. In the logical model the DBDs receive a well understood, formalized description of the business requirements (entities, relationships etc.) that can be mapped relatively easily onto the world that they understand (tables, relationships etc). DBDs also add a huge amount of detail that is of no interest to the users (data types, primary keys, indexes etc.). The logical model gradually turns into the physical model that the DBDs wanted in the first place.
Most of this answer was shamelessly plagiarized from Inside Relational Databases, a book by by Mark Whitehorn and Bill Marklyn, published by Springer Verlag. However, since Mark is one of the authors of the book, the chances of him suing himself are low.