DATA WAREHOUSE CONCEPTS.
What is Dimensional Modeling?
1. Dimensional Modeling is a kind of database modeling used to support OLAP applications.
2. It contains Dimensional and fact tables and relationships are created between them.
3. Is a technique for logically organizing business data in a way that helps end users understand it
4. Data is separated into facts and dimensions
5. Users view facts in any combination of the dimensions
6. Allows users to answer “Show me X by Y by Z” type questions
7. Example: Show me sales by product by month
2. What is E-R Modeling?
1 Entity - Relationship Modeling is a kind of database modeling used to support OLTP applications.
2It contains tables for entities of an organization and its functions/events.
3Relationships are created between entity tables and its respective events/functional tables.
. What is a Data Warehouse?
A Data Warehouse is a collection of data marts representing historical data from different operational data source (OLTP). The data from these OLTP are structured and optimized for querying and data analysis in a Data Warehouse.
What is a Data mart?
A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query Structure).
What is OLAP?
OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.
What is OLTP?
OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.
What are Cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.
What are Virtual Cubes?
These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.
What is Operational Data Store [ODS] ?
It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.
What is Denormalization?
Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key.
What is Surrogate Key?
A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table).
3. List differences between OLTP and OLAP?
OLTP | OLAP |
Used by Operational systems. Example Pharmacy Dispensing, Inpatient Billing, Purchase Order etc. | Used by Analytical Applications. Example: Business Intelligence tools such as Siebel Analytics, Business Objects, Cognos etc. |
Uses Highly normalized DB. Less or no redundancy. | Uses De-normalized DB. Data is duplicated to speed up select query and minimize joins |
Write intensive operations are performed. Example: INSERT/UPDATE/DELETE statements. | Mostly read oriented functionalities are performed. Predominantly SELECT statement is used. |
Data changes more frequently based on the business activity. | Data is loaded statically or in specific time period. |
SQL Queries are performed with more joins to achieve the desired functionality. | Less joined SQL Queries used. |
Small Datasets (records) are retrieved from DB to satisfy transactional functionality. | Large Volumes of Datasets (records) are retrieved from DB to satisfy analytical reporting need. |
Seldom contains summarized tables. | Contains summarized tables based on the nature of reporting need. |
Calculations are derived on the fly to achieve the desired functionality. | Precomputed based on the reporting requirement. |
4. What degree of normalization is used for OLAP DBs?
3rd Normal Form.
5. What degree of normalization is used for OLTP DBs?
5th Normal
6. Explain Activities of Normalization forms starting from 1 to 5?
Normal Form | Activity |
1st | Attributes of a table are defined. Repeating attributes are eliminated. e.g: Doctor, Speciality, Patient, Appointment time DoctId, Patients, Apt Time |
2nd | Eliminate Redundancy by enforcing. Primary or Composite Primary Keys. e.g: Doctor, Speciality, Patient, Appointment time DoctId, Speciality, PatId, Apt Time Avoids: avoid update and delete anomalies |
3rd | Eliminate attributes that are not depend on Key. e.g: DoctId, SpltyId, PatId and Apt Time Specialty Table: Splty Id, Splty Type, Description, etc. |
4th Normal Form | Relationships are created between tables. Doctor, Patient, Specialty |
5th Normal Form | Isolate Semantically related relationship between tables: Doctor->InPatient -> Splty ->OutPatient-> Splty |
What is Star Schema?
1. Logical Schema that is created based on Dimensional Modeling.
2. Contains set of dimension tables and one or more facts tables
3. one to many relationship between dimension and fact tables are created
4. A star schema can have any number of dimension tables.
5. Organizes data into a central fact table with surrounding dimension tables
6. Each dimension row has many associated fact rows
7. Dimension tables do not directly relate to each other
What is Snow flake schema?
1. It is extended Star Schema where dimensional tables having one to many relationship between them.
2. A snowflake schema can have any number of dimensions and each dimension can have any number of levels.
3. Snow flake schema is a normalize dimensions to eliminate the redundancy.
4. The dimension data has been grouped into one large table.
5. Both dimension and fact tables normalized
6. In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.
Example: Day->Week->Month->Quarter->Year
DimTable Normalization: | 3 Normal Form | 2 Normal Denormalized Form | |
Joins: | Higher number of Joins | Fewer Joins | |
Ease of Use: | More complex queries and hence less easy to understand | Less complex queries and easy to understand | |
Query Performance: | More foreign keys-and hence more query execution time | Less no. of foreign keys and hence lesser query execution time | |
Ease of maintenance/change: | No redundancy and hence more easy to maintain and change | Has redundant data and hence less easy to maintain/change | |
Type of Datawarehouse: | Good to use for small datawarehouses/datamarts | Good for large datawarehouses | |
What is the Fact Table?
Fact table contains the measures or metrics of business subjects of an organization.
Fact table contains the measures or metrics of business subjects of an organization.
it having summarized data(aggregated values)
It contain measerable table
Data is often numerical
Is the central table in the star
historical data.
This Fact Table has a foreign key-primary key relation with a dimension table.
The Fact Table maintains the information in 3rd normal form.
What are the types of Facts?
Additive - Measures that can be added across all dimensions.
A Fact which can be summed up for any of the dimension available in the fact table.
Additive Fact can be aggregated by simple arithmetical additions
- Non Additive - Measures that cannot be added across all dimensions.
Non-additive fact can’t be added at all.
A Fact which cannot be summed up for any of the dimensions available in the fact table.
- Semi Additive - Measures that can be added across few dimensions and not with others.
A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table
Semi additive fact can be aggregated simple arithmetical additions along with some other dimensions.
What are the types of Fact Tables?
The types of Fact Tables are:
1. Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts.
2. Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain non-additive and semi-additive facts.
What is fact constellation
It is the process of joining two
What is Grain of Fact?
The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level.
What is Factless Fact table?
The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.
What are Measures?
Measures are numeric data based on columns in a fact table
What is Dimention table
:detail description of each and every attribute
Contains attributes or characteristics about the business
Data is often descriptive (alphanumeric)
Qualifies the fact data
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it
contains only the textual attributes.
A dimensional table contains detail values/data which is short and wide(ie; less coloums and more rows) Always based on dimensions analysis is done in Datawarehousing. ...
What are Dimensions?
Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.
What are Confirmed Dimensions?
The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions
Examples: SalesRevenue, ChangeRequests
What is junk Dimension
Dimension with a type text description Boolean and flags
It can’t be used to describe th KPI(facts)
What is dirty dimension.
In a dimention table if the record exit more than once with difference in non-key attribute
Ex:scd type2 tables
What is degenerate dimension table?
the values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn t have its own dimensions.
Latest Answer: Granularity means nothing but it is a level of representation of measures and metrics.The lowest level is called detailed data and highest level is called summary data It depends of project we extract fact table significance ...
The smallest unit of analysis is called as 'grain'. Also known as the atomic unit of analysis. Deciding the grain and then on basis of it the aggregation level is called as the granularity of a fact
table.
How many types of Repositories are there?
There are three types of Repositories:
1. Standalone Repository
2. Global Repository
LocalRepository
What is slowly changing dimensions
What is slowly changing dimensions
SCD captures the changes which takes place over the period of time.There are 3 types of scd
SCD TYPE 1:A type 1 dimension keeps only the current values doesn’t maintain history
TYPE 2:dimension maintain the full history in the target for each update it inserts a new record in the target.
TYPE 3:Dimension maintain current and previous information
What is Staging Tables?
Staging tables are the tables that store the incremental data extracted from OLTP transactions. Before each ETL process these staging
Staging tables are the tables that store the incremental data extracted from OLTP transactions. Before each ETL process these staging
tables are truncated prior to loading. Data in the staging tables are then transformed and loaded into appropriate target tables by ETL
process.
What is Aggregate Table?
Aggregate table contains the summation of facts by a given dimension or set of dimensions. Purpose of aggregate table is to avoid resource intensive on the fly calculations and improve query response time.
What is Subset Dimension Table?
What is Aggregate Table?
Aggregate table contains the summation of facts by a given dimension or set of dimensions. Purpose of aggregate table is to avoid resource intensive on the fly calculations and improve query response time.
What is Subset Dimension Table?
In the SRMW the subset dimension tables are created by filtering the main dimension table. The main purpose of the dimension to use in conjunction with aggregated tables.
What is Mini Dimension Table?
In the SRMW the Mini Dimension tables are created using the combination of most frequently used attributes of their parent dimension. The purpose of Mini Dimension Table is to improve query response time.
What is Mini Dimension Table?
In the SRMW the Mini Dimension tables are created using the combination of most frequently used attributes of their parent dimension. The purpose of Mini Dimension Table is to improve query response time.
What is a staging area?
Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing.