Cognos

cognos

Saturday 24 September 2011

DatawareHouse Questions



What is data warehouse?
A: Data ware house is one type of database which is specially designed for analysis the business not for transactional purpose.

What is Data mart?
A: Data mart is a subset of Data ware house.

Difference between Data ware house and data mart?
A: Data ware house is maintain the total organization of data, where as data mart is maintained particular subject.

Operational data sources (ODS)?
A: The Operational Data Storage (ODS) system provides a way to save data that can be shared by multiple flow components or flows. ODS is a type of database that serves as a quick-access data storage. An ODS system lets you perform many queries on small amounts of data, and differs from a data warehouse, in which large amounts of information is stored and queries are run on a large volume of data

Dimensional Modeling?
A: Dimensional Modeling is a design concept used by many data warehouse desginers to build thier data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements ie, the dimensions on which the facts are calculated.

E-R modeling?
A: ER modeling that models an ER diagram represents the entire businesses or applications processes. This diagram can be segregated into multiple Dimensional models

Difference between Dimensional Modeling and E-R modeling?
A: Dimensional Modeling have only physical model, It is used for de-normalizing the rolap/molap design.
E-R modelings have logical model and physical model. It is used for normalizing the OLPT database design.

What is the Difference between OLTP and OLAP?
A: OLTP refers to transactional data or data captured at the point of transaction, whereas, OLAP is a multidimensional representation of data which throws out summary level information.

What is Data Transformation?
A: It is a process of converting the data and cleansing the data into a required format.
1.Data Merging: It is a process of combining the data from the multiple input pipelines (sources) to single output.
2.Data Cleaning: It is a process of removing unwanted data. It is a process of changing the inconsistence data into a consistence format.
3.Data scribing: It is process of deriving new data definitions according to the business required.
4.Data aggregation: It is a process of calculating the summarizes using aggregation values.

What is Data Extraction?
A: It is a process of reading the data from various systems like databases, operational files/sources, and flat files xml files.

What is loading?
A: Is the process of inserting new data into target system.

What is Fact less fact table?
A: Fact less fact table is nothing but a fact table which doesn’t have any measures.
What is schema?
A: schema is collection of database objects including tables, views, indexes, and sysnoms.

What is the difference between star schema and snowflake schema?
A: Star schema contain de-normalized data. It contain less no of joins, so performance will be increase
Snowflake schema contains normalized data. It contain more no of joins, so performance will be poor.

What is Galaxy schema?
A: Galaxy schema is known as fact constellation schema. It requires no of fact tables and dimensional tables.

What is fact table?
Fact table contains the measurements or metrics or facts of business process
Fact is a numeric field which is the key column to the business. But all numeric fields are not facts.
Fact is key performance indicator to analyze the business
It is a table which contains factual information of a business process.
Ex: Quantity, Revenue, Unit cost

Types of facts?
  • Additive –   facts that can be added across all dimensions.
  • Semi Additive –facts that can be added across some dimensions and not across others.
Ex: Bank Balance
  • Non Additive – Facts that cannot be added across any dimension. Non-additive facts are the facts which cannot be summed up for any of the dimensions present in the fact table.
ex: Ratio, avg

What is dimension?
A: Which will give  the description info of the business.
Ex: ordername.orderdate.description

Types of dimensions?
A: Conformed dimension: A dimension which can be shared with multiple fact table
Junk dimension: It is collection of random transactional codes, flags and/or, text attributes that are unrelated to any particular dimension.
Degenerated Dimension: A dimension which can be stored in fact table that is called” Degenerated Dimension”
Degenerated Dimension is a dimension which is derived from the fact table. It doesn’t have own dimension table

What is Staging Area?
A: It is a temporary storage location used for processing the data.

What is a Surrogate key?
A: Surrogate key is a unique identifier. It is used to identify each row of a table uniquely. It also helps to track slowly changing dimensions. It replaces all natural keys. Surrogate keys are system generated keys,they are not derived from any data sources.
Surrogate key is system generated sequence number to be used as a primary key.

What is Metadata?
A: Data about data.
 
What is cube?
A: Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data.

What is Granularity?
A: Granularity describes, up to what level of detail the measures are stored in the data warehouse. The data with the lowest level of granularity can be reshaped to meet different needs. Granular data can be easily summarized.

What is Candidate key?
A: Candidate Key is a column or a set of columns available in a table which allows distinct values and not null values.

Slowly changing dimension?
A: It explains how to capture the changes over period of time.
Slowly changing dimension Type1: It maintain only current data doesn’t   maintain historical data
Slowly changing dimension Type2: It maintain current data as well as historical data. For every changes in the source if inserts a new record in the target
Slowly changing dimension Type3: It keeps the current and previous information.

4 comments: