Define Data Warehouse
A data warehouse is a repository for data generated and collected by an enterprise’s various operational systems. It is a pool of data produced to support decision making, it is a repository of current and historical data
Characteristics of Data Warehousing.
Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support. Subject orientation enables users to determine not only how their business is performing, but why.
Integration is closely related to subject orientation. Data warehouses must place data from different sources into a consistent format. Data warehouse integrates data from various internal and External sources like ERP, Social Media , Customer Care , Customer’s Complain, Research and from other sources.
•Time variant (time series).
Data warehouse maintains historical data as well as current data. Data stored in data warehouse is helpful for detect trends, deviations, and long-term relationships for forecasting and comparisons, leading to decision making.
After data are entered into a data warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data.
Data warehouses are typically designed to provide an efficient computing environment for Web-based applications.
A data warehouse uses either a relational structure or a multidimensional structure
A data warehouse uses the client/server architecture to provide easy access for end users.
Newer data warehouses provide real-time, or active, data-access and analysis capabilities
A data warehouse contains metadata (data about data) about how the data are organized and how to effectively use them.
Data Warehouse Framework
Following are the major components of the data warehousing Process
Data are sourced from multiple independent operational “legacy” systems and possibly from external data provide (such as the U.S. Census). Data may also from an OLTP or ERP system.
•Data extraction and transformation.
Data are extracted and properly transformed using custom-written or commercial software called ETL.
Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/ or data marts.
Essentially, this is the EDW to support all decision analysis by providing relevant summarized and detailed information originating from many different sources.
Metadata are maintained so that they can be assessed by IT personnel and users. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools.
Dr. Preyal Sanghavi
R.B. Institute of Management Studies