1. What is a data warehouse?
A data warehouse is defined in this section as “a pool of data produced to support decision making.” This focuses on the essentials, leaving out characteristics that may vary from one DW to another but are not essential to the basic concept.
The same paragraph gives another definition: “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.” This definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc.
2. How is a data warehouse different from a database?
Technically a data warehouse is a database, albeit with certain characteristics to facilitate its role in decision support. Specifically, however, it is (see previous question) an “integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization.” These characteristics, which are discussed further in the section just after the definition, are not necessarily true of databases in general—though each could apply individually to a given one.
As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are highly denormalized for performance reasons. This is acceptable because their content is never updated, just added to. Historical data are static.
3. What is an ODS?
Operational Data Store is the database from which a business operates on an on-going basis.
4. Differentiate among a data mart, an ODS, and an EDW.
An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis.
Both an EDW and a data mart are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area.
5. Explain the importance of metadata.
Metadata, “data about data,” are the means through which applications and users access the content of a data warehouse, through which its security is managed, and through which organizational management manages, in the true sense of the word, its information assets. Most database management systems would be unable to function without at least some metadata. Indeed, the use of metadata, which enable data access through names and logical relationships rather than physical locations, is fundamental to the very concept of a DBMS.
Metadata are essential to any database, not just a data warehouse.
6. Describe the data warehousing process.
The data warehousing process consists of the following steps:
1. Data are imported from various internal and external sources
2. Data are cleansed and organized consistently with the organization’s needs
3. a. Data are loaded into the enterprise data warehouse, or
b. Data are loaded into data marts.
4. a. If desired, data marts are created as subsets of the EDW, or
b. The data marts are consolidated into the EDW
5. Analyses are performed as needed
7. Describe the major components of a data warehouse.
• Data sources. Data are sourced from operational systems and possibly from external data sources.
• Data extraction. Data are extracted using custom-written or commercial software called ETL.
• Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse.
• Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information.
• Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search.
• Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.
8. Identify the role of middleware tools.
Middleware tools enable access to the data warehouse. Power users such as analysts may write their own SQL queries. Others may access data through a managed query environment. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools. All these have their own data access requirements. Those may not match with how a given data warehouse must be accessed. Middleware translates between the two.
9. What are the key similarities and differences between a two-tiered and a three-tiered architecture?
Both provide the same user visibility through a client system that accesses a DSS/BI application remotely. The difference is behind the scenes and is invisible to the user: in a two-tiered architecture, the application and data warehouse reside on the same machine; in a three-tiered architecture, they are on separate machines.
10. How has the Web influenced data warehouse design?
Primarily by making Web-based data warehousing possible
11. List the alternative data warehousing architectures discussed in this section.
• Enterprise Data Warehousing Architecture
• Data Mart Architecture
• Hub-and-Spoke Data Mart Architecture
• Enterprise Warehouse and Operational Data Store
• Distributed Data Warehouse Architecture
12. What issues should be considered when deciding which architecture developing a data warehouse? List the 10 most important factors.
1. Information interdependence between organizational units
2. Upper management’s information needs
3. Urgency of need for a data warehouse
4. Nature of end-user tasks
5. Constraints on resources
6. Strategic view of the data warehouse prior to implementation
7. Compatibility with existing systems
8. Perceived ability of the in-house IT staff
9. Technical issues
10. Social/political factors
(This list does not explicitly say that these are the ten most important factors. You may choose others.)
13. Which data warehousing architecture is the best? Why?
What is interesting is the similarity of the averages for the bus, hub-and-spoke, and centralized architectures. The differences are sufficiently small that no claims can be made for a particular architecture’s superiority over the others, at least based on a simple comparison of these success measures.
14. Describe data integration.
Data integration is an umbrella term that covers three processes that combine to move data from multiple sources into a data warehouse: accessing the data, combining different views of the data and capturing changes to the data.
15. Describe the three steps of the ETL process.
Extraction: selecting data from one or more sources and reading the selected data.
Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible.
Loading: putting the converted (transformed) data into the DW.
16. Why is the ETL process so important for data warehousing efforts?
Since ETL is the process through which data are loaded into a data warehouse, a DW could not exist without it. The ETL process also contributes to the quality of the data in a DW.
17. List the benefits of data warehouses.
Direct benefits include:
• Allowing end users to perform extensive analysis in numerous ways.
• A consolidated view of corporate data (i.e., a single version of the truth).
• Better and more timely information. A data warehouse permits information processing to be offloaded from costly operational systems onto low-cost servers; therefore, end-user information requests can be processed more quickly.
• Enhanced system performance. A data warehouse frees production processing because some operational system reporting requirements are moved to DSS.
• Simplification of data access.
• Indirect benefits arise when end users take advantage of these direct benefits.
18. List several criteria for selecting a data warehouse vendor and describe why they are important.
Six criteria listed in the text are: financial strength, ERP linkages, qualified consultants, market share, industry experience, and established partnerships. These are important to indicate that a vendor is likely to be in business for the long term, to have the support capabilities its customers need, and to provide products that interoperate with other products the potential user has or may obtain.
One could add others, such as product functionality (Does it do what we need?), vendor strategic vision (Does their direction makes sense for our future plans and/or is consistent with industry trends?) and quality of customer references (What do their existing customers think of them?). These may be so obvious that the authors (or the author of the cited reference from which this list is taken) did not feel they needed to be mentioned, but they are still valid answers to this question.
19. Does a bottom-up data warehouse development approach use an enterprise data model?
It need not. Kimball’s bottom-up data mart approach usually uses dimensional data modeling, starting with tables. However, if an enterprise data model exists, its relevant parts can be used as a starting point for the tables rather than embarking on a from-the-ground-up data modeling project.
20. Describe the major similarities and differences between the Inmon and Kimball data warehouse development approaches.
Similarities: Both methods can produce an enterprise data warehouse and subset data marts.
Differences: Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets of that EDW if appropriate. The focus is on proven, traditional methods and technologies. Kimball’s starts with data marts, consolidating them into an EDW later if appropriate. It focuses in creating a useful end-user capability quickly.
21. What is an RDW?
A real-time data warehouse, in which decision making data are updated on an ongoing basis as business transactions occur; same as an active data warehouse (ADW).
22. List the benefits of an RDW.
The RDW extends the benefits of data warehousing, in general, down into tactical, and perhaps operational, decision making. It empowers people who interact directly with customers and suppliers by providing them with information to make decisions. It can then be extended to customers and suppliers themselves, thus affecting almost all aspects of customer service, SCM, logistics and beyond. It can also facilitate e-business activities, as when sales outlets such as overstock.com (cited in the text), woot.com and steepandcheap.com use historical data to price new close-outs.
23. What are the major differences between a traditional data warehouse and an RDW?
1. A traditional data warehouse (TDW) is used for strategic decisions (and sometimes tactical); an RDW for strategic and tactical (sometimes operational) ones.
2. The results of using a TDW can be hard to measure; results of using an RDW are measured by operational data.
3. Acceptable TDW refresh rates range from daily to monthly; RDW data must be up to the minute.
4. TDW summaries are often appropriate; RDWs must supply detailed data.
5. Small user community at upper organizational levels means a TDW supports few concurrent users; an RDW must support many, perhaps over a thousand.
6. TDWs typically use restrictive reporting to confirm or check patterns, often predefined summary tables; RDWs need flexible, ad hoc reporting.
7. TDW user community generally consists of power users, knowledge workers, managers, other internal users; RDWs are used by operational staff, call centers, perhaps external users.
24. List some of the drivers for RDW.
• A business often cannot afford to wait a whole day for its operational data to load into the data warehouse for analysis.
• Traditional data warehouses have captured snapshots of an organization’s fixed states instead of incremental real-time data showing every state change and almost analogous patterns over time.
• With a traditional hub-and-spoke architecture, retaining the metadata in sync is difficult. It is also costly to develop, maintain, and secure many systems as opposed to one huge data warehouse so that data are centralized for BI/BA tools.
• In cases of huge nightly batch loads, the necessary ETL setup and processing power for large nightly data warehouse loading might be very high, and the processes might take too long. An EAI with real-time data collection can reduce or eliminate the nightly batch processes.
25. What steps can an organization take to ensure the security and confidentiality of customer data in its data warehouse?
Effective security in a data warehouse should focus on four main areas:
Step 1. Establishing effective corporate and security policies and procedures. An effective security policy should start at the top and be communicated to everyone in the organization.
Step 2. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption.
Step 3. Limiting physical access to the data center environment.
Step 4. Establishing an effective internal control review process for security and privacy.
26. What skills should a DWA possess? Why?
• Familiarity with high-performance hardware, software and networking technologies, since the data warehouse is based on those
• Solid business insight, to understand the purpose of the DW and its business justification
• Familiarity with business decision, making processes to understand how the DW will be used
• Excellent communication skills, to communicate with the rest of the organization
27. Compare data integration and ETL. How are they related?
Data integration consists of three processes that integrate data from multiple sources into a data warehouse: accessing the data, combining different views of the data and capturing changes to the data. It makes data available to ETL tools and, through the three processes of ETL, to the analysis tools of the data warehousing environment.
28. What is a data warehouse and what are its benefits? Why is Web accessibility important with a data warehouse?
A data warehouse can be defined as “a pool of data produced to support decision making.” This focuses on the essentials, leaving out characteristics that may vary from one DW to another but are not essential to the basic concept.
The same paragraph gives another definition: “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.” This definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc.
The benefits of a data warehouse are that it provides decision making information, organized in a way that facilitates the types of access required for that purpose and supported by a wide range of software designed to work with it.
Web accessibility of a data warehouse is important because many analysis applications are Web-based, because users often access data over the Web (or over an intranet using the same tools) and because data from the Web may feed the DW.
29. A data mart can replace a data warehouse or complement it. Compare and discuss these options.
For a data mart to replace a data warehouse, it must make the DW unnecessary. This would mean that all the analyses for which the DW would be used can instead be satisfied by a DM (or perhaps a combination of several DMs). If this is so, it can be much less expensive, in terms of development and computer resources, to use multiple DMs (let alone one DM!) instead of an overall DW.
In other situations, a data mart can be used for some analyses which would in its absence use the DW, but not all of them. For those, the smaller DM is more efficient—quite possibly, enough more efficient as to justify the cost of having a DM in addition to a DW. Here the DM complements the DW.
30. Discuss the major drivers and benefits of data warehousing to end users.
Major drivers include:
a. Increased competition and pace of business, leading to increased need for good decisions quickly
b. Successful pioneering experiences with data warehouses, leading to their wider user acceptance
c. Decreasing hardware costs, making terabyte databases with masses of historical data economically feasible for more firms
d. Increased availability of software to manage a large data warehouse
e. Increased availability of analysis tools making DWs potentially more useful
f. Increased computer literacy of decision makers, making them more likely to use these tools
31. List the differences and/or similarities between the roles of a database administrator and a data warehouse administrator.
Since a data warehouse is a specific type of database designed for a specific application area, a data warehouse administrator has all the roles of a database administrator—plus others. One new role is advising on decision support uses of the DW, for which a DWA needs to understand decision making processes. Beyond that, the issue is more a need for additional skills in the same roles as a DBA—e.g., understanding high-performance hardware to deal with the large size of a DW—than it is one of additional roles.
32. Describe how data integration can lead to higher levels of data quality.
A question involving the word “higher” (or any other comparative, for that matter) requires asking “higher than what?” In this case, we can take it to mean “higher than we would have for the same data, but without a formal data integration process.”
Without a data integration process to combine data in a planned and structured manner, data might be combined incorrectly. That could lead to misunderstood data (a measurement in meters taken as being in feet) and to inconsistent data (data from one source applying to calendar months, data from another to four-week or five-week fiscal months). These are aspects of low-quality data which can be avoided, or at least reduced, by data integration.
33. Compare the Kimball and Inmon approaches toward data warehouse development. Identify when each one is most effective.
Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets if appropriate. It is most effective when there is a recognized need for an EDW, an executive “champion” of the project, and a willingness to invest in a data warehousing infrastructure before it will show results.
Kimball’s approach starts with data marts, consolidating them into an EDW later if appropriate. It is most effective when it is desired to provide a “proof of concept” implementation before embarking on a full-scale EDW project or when a well-defined area with the greatest benefits can be identified.
34. Discuss security concerns involved in building a data warehouse.
Security and privacy concerns are important in building a data warehouse:
1. Laws and regulations, in the U.S. and elsewhere, require certain safeguards on databases that contain the type of information typically found in a DW.
2. The large amount of valuable corporate data in a data warehouse can make it an attractive target.
3. The need to allow a wide variety of unplanned queries in a DW makes it impractical to restrict end user access to specific carefully constrained screens, one way to limit potential violations.
A data warehouse is defined in this section as “a pool of data produced to support decision making.” This focuses on the essentials, leaving out characteristics that may vary from one DW to another but are not essential to the basic concept.
The same paragraph gives another definition: “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.” This definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc.
2. How is a data warehouse different from a database?
Technically a data warehouse is a database, albeit with certain characteristics to facilitate its role in decision support. Specifically, however, it is (see previous question) an “integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization.” These characteristics, which are discussed further in the section just after the definition, are not necessarily true of databases in general—though each could apply individually to a given one.
As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are highly denormalized for performance reasons. This is acceptable because their content is never updated, just added to. Historical data are static.
3. What is an ODS?
Operational Data Store is the database from which a business operates on an on-going basis.
4. Differentiate among a data mart, an ODS, and an EDW.
An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis.
Both an EDW and a data mart are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area.
5. Explain the importance of metadata.
Metadata, “data about data,” are the means through which applications and users access the content of a data warehouse, through which its security is managed, and through which organizational management manages, in the true sense of the word, its information assets. Most database management systems would be unable to function without at least some metadata. Indeed, the use of metadata, which enable data access through names and logical relationships rather than physical locations, is fundamental to the very concept of a DBMS.
Metadata are essential to any database, not just a data warehouse.
6. Describe the data warehousing process.
The data warehousing process consists of the following steps:
1. Data are imported from various internal and external sources
2. Data are cleansed and organized consistently with the organization’s needs
3. a. Data are loaded into the enterprise data warehouse, or
b. Data are loaded into data marts.
4. a. If desired, data marts are created as subsets of the EDW, or
b. The data marts are consolidated into the EDW
5. Analyses are performed as needed
7. Describe the major components of a data warehouse.
• Data sources. Data are sourced from operational systems and possibly from external data sources.
• Data extraction. Data are extracted using custom-written or commercial software called ETL.
• Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse.
• Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information.
• Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search.
• Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.
8. Identify the role of middleware tools.
Middleware tools enable access to the data warehouse. Power users such as analysts may write their own SQL queries. Others may access data through a managed query environment. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools. All these have their own data access requirements. Those may not match with how a given data warehouse must be accessed. Middleware translates between the two.
9. What are the key similarities and differences between a two-tiered and a three-tiered architecture?
Both provide the same user visibility through a client system that accesses a DSS/BI application remotely. The difference is behind the scenes and is invisible to the user: in a two-tiered architecture, the application and data warehouse reside on the same machine; in a three-tiered architecture, they are on separate machines.
10. How has the Web influenced data warehouse design?
Primarily by making Web-based data warehousing possible
11. List the alternative data warehousing architectures discussed in this section.
• Enterprise Data Warehousing Architecture
• Data Mart Architecture
• Hub-and-Spoke Data Mart Architecture
• Enterprise Warehouse and Operational Data Store
• Distributed Data Warehouse Architecture
12. What issues should be considered when deciding which architecture developing a data warehouse? List the 10 most important factors.
1. Information interdependence between organizational units
2. Upper management’s information needs
3. Urgency of need for a data warehouse
4. Nature of end-user tasks
5. Constraints on resources
6. Strategic view of the data warehouse prior to implementation
7. Compatibility with existing systems
8. Perceived ability of the in-house IT staff
9. Technical issues
10. Social/political factors
(This list does not explicitly say that these are the ten most important factors. You may choose others.)
13. Which data warehousing architecture is the best? Why?
What is interesting is the similarity of the averages for the bus, hub-and-spoke, and centralized architectures. The differences are sufficiently small that no claims can be made for a particular architecture’s superiority over the others, at least based on a simple comparison of these success measures.
14. Describe data integration.
Data integration is an umbrella term that covers three processes that combine to move data from multiple sources into a data warehouse: accessing the data, combining different views of the data and capturing changes to the data.
15. Describe the three steps of the ETL process.
Extraction: selecting data from one or more sources and reading the selected data.
Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible.
Loading: putting the converted (transformed) data into the DW.
16. Why is the ETL process so important for data warehousing efforts?
Since ETL is the process through which data are loaded into a data warehouse, a DW could not exist without it. The ETL process also contributes to the quality of the data in a DW.
17. List the benefits of data warehouses.
Direct benefits include:
• Allowing end users to perform extensive analysis in numerous ways.
• A consolidated view of corporate data (i.e., a single version of the truth).
• Better and more timely information. A data warehouse permits information processing to be offloaded from costly operational systems onto low-cost servers; therefore, end-user information requests can be processed more quickly.
• Enhanced system performance. A data warehouse frees production processing because some operational system reporting requirements are moved to DSS.
• Simplification of data access.
• Indirect benefits arise when end users take advantage of these direct benefits.
18. List several criteria for selecting a data warehouse vendor and describe why they are important.
Six criteria listed in the text are: financial strength, ERP linkages, qualified consultants, market share, industry experience, and established partnerships. These are important to indicate that a vendor is likely to be in business for the long term, to have the support capabilities its customers need, and to provide products that interoperate with other products the potential user has or may obtain.
One could add others, such as product functionality (Does it do what we need?), vendor strategic vision (Does their direction makes sense for our future plans and/or is consistent with industry trends?) and quality of customer references (What do their existing customers think of them?). These may be so obvious that the authors (or the author of the cited reference from which this list is taken) did not feel they needed to be mentioned, but they are still valid answers to this question.
19. Does a bottom-up data warehouse development approach use an enterprise data model?
It need not. Kimball’s bottom-up data mart approach usually uses dimensional data modeling, starting with tables. However, if an enterprise data model exists, its relevant parts can be used as a starting point for the tables rather than embarking on a from-the-ground-up data modeling project.
20. Describe the major similarities and differences between the Inmon and Kimball data warehouse development approaches.
Similarities: Both methods can produce an enterprise data warehouse and subset data marts.
Differences: Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets of that EDW if appropriate. The focus is on proven, traditional methods and technologies. Kimball’s starts with data marts, consolidating them into an EDW later if appropriate. It focuses in creating a useful end-user capability quickly.
21. What is an RDW?
A real-time data warehouse, in which decision making data are updated on an ongoing basis as business transactions occur; same as an active data warehouse (ADW).
22. List the benefits of an RDW.
The RDW extends the benefits of data warehousing, in general, down into tactical, and perhaps operational, decision making. It empowers people who interact directly with customers and suppliers by providing them with information to make decisions. It can then be extended to customers and suppliers themselves, thus affecting almost all aspects of customer service, SCM, logistics and beyond. It can also facilitate e-business activities, as when sales outlets such as overstock.com (cited in the text), woot.com and steepandcheap.com use historical data to price new close-outs.
23. What are the major differences between a traditional data warehouse and an RDW?
1. A traditional data warehouse (TDW) is used for strategic decisions (and sometimes tactical); an RDW for strategic and tactical (sometimes operational) ones.
2. The results of using a TDW can be hard to measure; results of using an RDW are measured by operational data.
3. Acceptable TDW refresh rates range from daily to monthly; RDW data must be up to the minute.
4. TDW summaries are often appropriate; RDWs must supply detailed data.
5. Small user community at upper organizational levels means a TDW supports few concurrent users; an RDW must support many, perhaps over a thousand.
6. TDWs typically use restrictive reporting to confirm or check patterns, often predefined summary tables; RDWs need flexible, ad hoc reporting.
7. TDW user community generally consists of power users, knowledge workers, managers, other internal users; RDWs are used by operational staff, call centers, perhaps external users.
24. List some of the drivers for RDW.
• A business often cannot afford to wait a whole day for its operational data to load into the data warehouse for analysis.
• Traditional data warehouses have captured snapshots of an organization’s fixed states instead of incremental real-time data showing every state change and almost analogous patterns over time.
• With a traditional hub-and-spoke architecture, retaining the metadata in sync is difficult. It is also costly to develop, maintain, and secure many systems as opposed to one huge data warehouse so that data are centralized for BI/BA tools.
• In cases of huge nightly batch loads, the necessary ETL setup and processing power for large nightly data warehouse loading might be very high, and the processes might take too long. An EAI with real-time data collection can reduce or eliminate the nightly batch processes.
25. What steps can an organization take to ensure the security and confidentiality of customer data in its data warehouse?
Effective security in a data warehouse should focus on four main areas:
Step 1. Establishing effective corporate and security policies and procedures. An effective security policy should start at the top and be communicated to everyone in the organization.
Step 2. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption.
Step 3. Limiting physical access to the data center environment.
Step 4. Establishing an effective internal control review process for security and privacy.
26. What skills should a DWA possess? Why?
• Familiarity with high-performance hardware, software and networking technologies, since the data warehouse is based on those
• Solid business insight, to understand the purpose of the DW and its business justification
• Familiarity with business decision, making processes to understand how the DW will be used
• Excellent communication skills, to communicate with the rest of the organization
27. Compare data integration and ETL. How are they related?
Data integration consists of three processes that integrate data from multiple sources into a data warehouse: accessing the data, combining different views of the data and capturing changes to the data. It makes data available to ETL tools and, through the three processes of ETL, to the analysis tools of the data warehousing environment.
28. What is a data warehouse and what are its benefits? Why is Web accessibility important with a data warehouse?
A data warehouse can be defined as “a pool of data produced to support decision making.” This focuses on the essentials, leaving out characteristics that may vary from one DW to another but are not essential to the basic concept.
The same paragraph gives another definition: “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.” This definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc.
The benefits of a data warehouse are that it provides decision making information, organized in a way that facilitates the types of access required for that purpose and supported by a wide range of software designed to work with it.
Web accessibility of a data warehouse is important because many analysis applications are Web-based, because users often access data over the Web (or over an intranet using the same tools) and because data from the Web may feed the DW.
29. A data mart can replace a data warehouse or complement it. Compare and discuss these options.
For a data mart to replace a data warehouse, it must make the DW unnecessary. This would mean that all the analyses for which the DW would be used can instead be satisfied by a DM (or perhaps a combination of several DMs). If this is so, it can be much less expensive, in terms of development and computer resources, to use multiple DMs (let alone one DM!) instead of an overall DW.
In other situations, a data mart can be used for some analyses which would in its absence use the DW, but not all of them. For those, the smaller DM is more efficient—quite possibly, enough more efficient as to justify the cost of having a DM in addition to a DW. Here the DM complements the DW.
30. Discuss the major drivers and benefits of data warehousing to end users.
Major drivers include:
a. Increased competition and pace of business, leading to increased need for good decisions quickly
b. Successful pioneering experiences with data warehouses, leading to their wider user acceptance
c. Decreasing hardware costs, making terabyte databases with masses of historical data economically feasible for more firms
d. Increased availability of software to manage a large data warehouse
e. Increased availability of analysis tools making DWs potentially more useful
f. Increased computer literacy of decision makers, making them more likely to use these tools
31. List the differences and/or similarities between the roles of a database administrator and a data warehouse administrator.
Since a data warehouse is a specific type of database designed for a specific application area, a data warehouse administrator has all the roles of a database administrator—plus others. One new role is advising on decision support uses of the DW, for which a DWA needs to understand decision making processes. Beyond that, the issue is more a need for additional skills in the same roles as a DBA—e.g., understanding high-performance hardware to deal with the large size of a DW—than it is one of additional roles.
32. Describe how data integration can lead to higher levels of data quality.
A question involving the word “higher” (or any other comparative, for that matter) requires asking “higher than what?” In this case, we can take it to mean “higher than we would have for the same data, but without a formal data integration process.”
Without a data integration process to combine data in a planned and structured manner, data might be combined incorrectly. That could lead to misunderstood data (a measurement in meters taken as being in feet) and to inconsistent data (data from one source applying to calendar months, data from another to four-week or five-week fiscal months). These are aspects of low-quality data which can be avoided, or at least reduced, by data integration.
33. Compare the Kimball and Inmon approaches toward data warehouse development. Identify when each one is most effective.
Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets if appropriate. It is most effective when there is a recognized need for an EDW, an executive “champion” of the project, and a willingness to invest in a data warehousing infrastructure before it will show results.
Kimball’s approach starts with data marts, consolidating them into an EDW later if appropriate. It is most effective when it is desired to provide a “proof of concept” implementation before embarking on a full-scale EDW project or when a well-defined area with the greatest benefits can be identified.
34. Discuss security concerns involved in building a data warehouse.
Security and privacy concerns are important in building a data warehouse:
1. Laws and regulations, in the U.S. and elsewhere, require certain safeguards on databases that contain the type of information typically found in a DW.
2. The large amount of valuable corporate data in a data warehouse can make it an attractive target.
3. The need to allow a wide variety of unplanned queries in a DW makes it impractical to restrict end user access to specific carefully constrained screens, one way to limit potential violations.
