University of Massachusetts, Boston
MSIS 670
Chapter 4 Technologies enabling Information Integration Notes:
How is information integrated? In this chapter we focus on two significant ways:
1. Synthesis of new insights from unstructured data residing in the organization’s enterprise
systems, such as enterprise portals and document management systems.
2. Creation of n
...[Show More]
Chapter 4 Technologies enabling Information Integration Notes:
How is information integrated? In this chapter we focus on two significant ways:
1. Synthesis of new insights from unstructured data residing in the organization’s enterprise
systems, such as enterprise portals and document management systems.
2. Creation of new insights via the integration of structured organizational data with external data,
such as Web-based unstructured information from customer Web Sites or vendor data sources.
4.2 Integration of data sources in a business intelligence application
The decision of whether to use external data or not is related primarily to:
1. Business need: For example, Hewlett Packard’s BI application, which supports their computer
manufacturing supply chain requires intimate knowledge of structured data from vendors that is
necessary to provide the firm with important information such as availability and delivery dates
for the requisite electronic components.
2. Data availability and organizational expertise: The lack of in-hour data may be one of the reasons
an organization may need to rely on external data.
3. BI application sophistication: As BI applications become increasingly complex, external or realtime data may be used to improve the breadth of intelligence and predictive ability of such
implementations, by augmenting the sources of information used to create new knowledge.
4. BI budget: Integrating external or real-time data is typically increasingly expensive, as purchasing
external data sources as well as integrating more granular and timely data command additional
investments. Therefore, organizations would need to develop a business model that supports
the return of investment on the purchase and integration of external data in the BI application.
As a rule of thumb, understanding and preparing the data consumes most of the time and resources in
the implementation of the BI application and may consume from 50% to 80% of the project resources.
Data understanding includes the following steps:
1. Data collection: This step involves defining the data sources for the study, including as we
mentioned above the use of external public data and proprietary databases. The outcome of this
step includes the description of the data sources, the data owners, and who maintains the data,
cost, storage format and structure, size, physical storage characteristics, security requirements,
restrictions on use, and privacy requirements.
2. Data description: this step describes the contents of each of the BI data sources. Significant
descriptors may include the number of fields and a measure of how sparse the database is. Also
for each data field, the following items should be described: data type, definition, description,
sources, a unit of measure, number of unique values, and range of values. Other important data
descriptors may include when, how, and the timeframe in which the data were collected. Finally,
descriptors about which attributes are the primary and foreign keys in a relational database
should also be defined.
3. Data quality and verifications: this step define whether any of the data should be disregarded
due to irrelevance or lack of quality. According to a Gartner study, more then 35% of critical data
in Fortune 1000 companies is flawed. Gartner specifies that a number of data quality issues need
to be considered: whether the organization has the data, its validity, consistency, integrity,
accuracy, and relevance. In a BI application, the GIGO (garbage-in-garbage-out) principle
applies. This means that irrelevant or inconsistent data must be excluded from the analysis;
otherwise it will negatively affect the results of the application results.
4.3 Environmental Scanning
Environmental scanning is defined as “scanning for information about events and relationships in a
company’s outside environment, the knowledge of which would assist top management in its task of
charting the company’s future course of action” and “the acquisition and use of information about
events, trends, and relationships in an organization’s external environment, the knowledge of which
would assist management in planning the organization’s future course of action”.
[Show Less]