Why Data Modelling is Important in BI

As per my experience in working on reporting systems based on semantic models, aka “self-service BI”, but the principles are the same we had in DSS (decision support system), OLAP (on-line analytical processing), and many other names that we’ve seen over the years.

The basic idea is always the same: enable the user to navigate data without rewriting a new query every time. In order to do that, a description of the data model with metadata that enrich the simple collection of table and relationships allows a generic client tool to offer a customized experience, as it was designed for a specific business need.
Creating a good data model specific for the business analysis is important because the goal is different compared to a database that runs a transactional application. If you want to record sales, you need a database that is optimized to collect data in a safe and accurate way. If you want to analyze sales, you need a data model that is optimized to query and aggregate data in volume. These two requirements are different in a way that is much larger than a simple performance issue.

A normalized model for a transactional application might show challenges for a query. How to interpret a missing data or a null value? A data model optimized for the analysis is simpler in its nature, because the data have been transformed so that they are unambiguous.

Power BI shares the same data modeling experience of Power Pivot and Analysis Services Tabular. The easiest way to create a data model is to get the data “as is” from a data source, define relationships, and then start designing reports. However, this approach raises several issues when you have too many tables, or different granularities in tables representing the same entity from different data sources. Power Query and the M language are here to solve these problems. If you are so lucky to read data from a well-designed data mart that follows the star schema principles, you don’t need any transformation. But in all the other cases, you probably should massage your tables before creating your reports.

The question is: what is the right data model for the business analysis?
The first simple answer is: the model should be designed to answer business questions, and not to simply represent the data as they come from the data source.
Yes, this answer is not a real answer, because it does not provide any practical guidance.
A better hint is: one entity, one table. This is a “user-friendly” way to describe a star schema, where each set of attributes describing a business entity is collected in a table called “dimension”, and each set of numeric metrics describing events and sharing the same granularity is saved in a table called “fact table”. But “one entity, one table” is simpler to understand.

A product is a table. Category, color, price, manufacturer: these are attributes. There is no added value in creating many relationships to just describe attributes of the same entity. Just collapse all these attributes in the same Product table.
A customer is a table. Country, city, address, number of employees are just attributes of the customer. It is useless to create a table with the countries.
Unless the country is a business entity in your analysis. If you are doing demographical researches, chances are that the country is a business entity, and not just an attribute of a customer.
An invoice is a table. Date of the invoice, shipping cost, products sold, quantities… all these attributes should be in a table with a conformed granularity. And when this is not possible, but only then, you start considering several tables at different granularities.

Design principles for data modelling

There are no right or wrong data model; just good or bad. Good data modelling is difficult; and the follow design principles could be useful:

–Be faithful to the specification of the requirement

–Use common sense and make assumption only if the specification fails to explain

–Avoid duplication and other redundant information

–The KISS principle

Customer: A customer may not yet have booked for a safari, or may have booked for and participated in several safaris. A company or an individual person may be a customer – but only individual persons are booked into scheduled safari trip occurrences. The information to be stored about a customer is:

  • For a company customer: name, address (first line address, second line address, city, postcode, country) and contact name, email address and phone number. A contact may change over time
  • For a person customer: first name, last name, address (first line address, second line address, city, postcode, country), email address, phone number and date of birth
  • A customer’s preferences (whether company or person) for time slots during which they wish to travel must be retained in the database. A customer may be able to travel in several time slots during the year. Tane wishes to retain this information so that he can target only those able to participate with publicity for scheduled trips

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s