|
Go
To: A B C
D E F
G H I
J K L
M N O
P Q R
S T U
V W X
Y Z
A
Agent
This is a rules-driven, automated application that searches
the data and sends an alert when a particular pattern
is found.
Aggregations
In data warehousing, aggregations refer to summary levels
of data. While this can be done to save storage space,
it is typically done to improve performance on data
access for end users. In a relational data warehouse
there may be several levels of aggregated fact tables;
however, maintenance concerns should limit a database
architect from overdoing this.
Multidimensional
OLAP databases are very good at handling a large variety
of aggregated hierarchies. For this reason it is becoming
more common to see hybrid solutions implemented to leverage
most aggregations in a multidimensional database, with
the lowest levels of detail left for drill-thru into
relational fact tables.
Alert
This is a message that is sent automatically by a computer
system when identified thresholds or events occur. The
monitored data and events can be inside or outside of
the company. Internal key performance metrics can be
monitored for variance to budget, while outside events
such as the posted price of key raw materials or news
events on big clients can also trigger an alert. Alerts
can typically be delivered in a variety of forms such
as e-mail or pager, or can even update a database.
Analysis
Services
This is the OLAP database and data mining product provided
by Microsoft that comes as a component in SQL Server
2000. This OLAP database has the noted flexibility of
be able to develop data marts as Relational OLAP, Multidimensional
OLAP, or the combination Hybrid OLAP. This allows designers
to take advantage of pre-summarized aggregations, or
run in the dynamic ROLAP mode to provide near real-time
data warehousing information.
Attribute
Additional information included in a dimension table
that is not used in defining the hierarchical levels
of the dimension.
B
Business Intelligence Tools
This term represents a variety of software products
that provide easy-to-use reporting and analysis capabilities
for business users. While there are several variations
of these tools, they can be categorized as follows:
- Enterprise
Reporting: This is a web-based reporting solution
that generates scheduled and on-demand static reports
and queries that can be viewed online or automatically
e-mailed to subscribed users, and can be saved in
several formats, such as HTML, PDF, or Excel.
- Online
Analytical Processing OLAP: This is a web-based
analytical tool that allows user to interact with
the data by double-clicking and drilling down on dimensions
and performing a variety of other analysis.
- Data
Mining Tools: This type of tool allows a computer
to run batch jobs to perform advanced analysis algorithms
to identify correlations, patterns, and trends in
large data sets. Typical examples of use are grocery
store basket analysis and marketing campaign customer
grouping.
C
Calculated Member
This refers to computed metrics built into OLAP databases
such as Microsoft Analysis Services or Hyperion Essbase.
These formulas can involve components from different
dimensions and are much more powerful then traditional
relational database SQL calculations. While Hyperion
Essbase offers a large number of pre-built functions,
Analysis Services offers a scripting language called
MDX that allows developers to build its functions.
Cell
A cell is a single point in a cube, which is an intersection
of a single member from every dimension.
Cellset
A set of data returned from a cube with an MDX query.
A cellset in the multidimensional world is equivalent
to a rowset in the relational database world.
Changed
Data Capture
In a database replication, changed data capture occurs
when only the data that has changed since the previous
replication is copied.
Changing
Dimensions
This refers to anytime a level or attribute of a dimension
needs to be updated. This is one of the most challenging
design issues for star schema data modeling. This issue
is often referred to as the handling of Slowly Changing
Dimensions. There are three ways outlined by Ralph Kimball
(and others) to handle this situation:
- Type
1: Change the data in the dimension table record.
This approach replaces the old value with the new.
Although the easiest to implement, it effectively
eliminates any history of the old values existence.
- Type
2: Create new records in the dimension table with
the new information. This allows for the tracking
of the old and new versions of the given dimension
record; however, having multiple occurrences of a
given dimension member, such as a customer, can cause
some relational query confusion for end users.
- Type
3: Design extra columns in the dimension record
to hold values before and after the change. While
this identifies the prior value, it will not hold
long-term history.
Various
authors (including Ralph Kimball) also suggest the splitting
of dimensions into separate dimensions when a consolidated
dimension would result in a high level of dimension
change. The design decision should be based on end-user
needs in regards to the changing dimension elements.
Clickstream
Data
This is data that is generated by all visitors to a
web site and is primarily captured in log files. Web
servers capture a large amount of data in the process
of receiving requests for web pages. This data includes
page served, time, source of the request, type of browser
making request, etc. When analyzed, this data provides
information about the behavior of individuals who are
browsing the Internet. It can help businesses analyze
where visitors are coming from, what type of visitors
are most likely to buy certain products, what type of
web pages are most attractive, etc. This information
is essential for analyzing the effectiveness of Internet
ad campaigns and, in general, for finding ways to improve
the effectiveness of Internet commerce.
Conformed
Dimension
A dimension that is attached to more than one fact table.
This is the key to Ralph Kimballs star schema
design bus architecture. This ensures that there is
only one version of a given dimension, such as a product
dimension, and it will link to any fact tale that requires
product information. This is how integrated fact tables
can share metrics from each other and present one version
of truth to the user across different subject areas
of the business, thus creating an enterprise data warehouse.
Cube
This is the popular slang term for a Multidimensional
or Hyper Cube. While most OLAP databases typically have
far more dimensions then the three sides of a cube,
the three-dimensional concept of a cube makes it easy
to explain the intersections of various dimensions levels
when first training users of an OLAP database.
D
Data Cleansing
There are often errors and inconsistencies in source
data being brought into the data warehouse. The process
of transforming the data into a complete and consistent
set of information is often referred to as Data Cleansing.
Data
Mart
A data mart is a database that has the same design characteristics
as a data warehouse, but is focused on a single subject
area of data within an enterprise. While there are varying
definitions on how data marts fit into a data warehousing
solution, we prefer the following:
The data warehouse gathers all the information from
the various legacy systems. Specialized data marts are
then created with a subset of the information in the
data warehouse. These data marts are easier to use because
they have only the particular information the specific
user group needs. The use of several data marts also
allows the querying load to be spread among several
different computers. This can reduce network traffic.
It is common for these data marts to be developed in
OLAP cubing products such as Hyperion Essbase and Microsoft
Analysis Services.
Data
Migration
The movement of data from one environment to another.
This happens when data is extracted from a legacy system,
and is transformed and loaded into a data warehouse.
Data
Mining
This is a business intelligence process that allows
a computer to run batch jobs to perform advanced analysis
algorithms to identify correlations, patterns, and trends
in large data sets. Typical examples of use are grocery
store basket analysis and marketing campaign customer
grouping.
Analyzing
data involves the recognition of significant patterns.
Human analysts can see patterns in small data sets.
Specialized data mining tools are able to find patterns
in large amounts of data. These tools are also able
to analyze significant relationships that exist only
when several dimensions are viewed at the same time.
Data
Quality Assurance
The process of checking the quality of the data being
imported into the data warehouse.
Data
quality assurance is one of the greatest challenges
in the process of data warehousing. If the data-based
knowledge generated by the data warehouse is to be trusted,
the data entered into the warehouse must be complete
and accurate. Data quality can be a challenge for several
reasons: The data is being consolidated from a variety
of legacy sources that may have differing key structures,
user terminology, or un-edited values. There also might
be times when all the data is not received from one
of the legacy systems. This could make comparisons between
time periods invalid. A significant portion of time
in the development process should be set aside for setting
up the data quality assurance process and implementing
whatever data cleansing is needed. In a production environment,
there should be a data quality report generated after
each data warehouse load. There should be provision
for rolling back the load (or reloading) if data quality
testing indicates that the data is unacceptable.
Data
Scrubbing
There are often errors and inconsistencies in source
data being brought into the data warehouse. The process
of transforming the data into a complete and consistent
set of information is often referred to as Data Scrubbing
or Data Cleansing.
Data
Transformation
The modification of data as it is moved into the data
warehouse. This modification process includes a variety
of tasks such as Data Cleansing, denormalization of
dimensions, processing calculated fields, changing data
types for consistency, replacing codes with values,
aggregations, etc.
Data
Warehouse
A data warehouse is a centralized data repository that
collects, organizes, integrates, and makes data available
for the purpose of reporting and analysis to
give management the ability to access and analyze information
about its business. The data is typically stored in
a dimensional data model design that is optimized for
ease of use and performance with business intelligence
reporting and analysis tools.
Data
Warehousing
The process of planning, gathering requirements, designing,
developing, testing, deploying, using, managing, auditing,
maintaining, and enhancing data warehouses and/or data
marts.
Decision
Support System (DSS)
A computer system designed to assist an organization
in making decisions. The Decision Support Systems and
Enterprise Information Systems of the 1980s and early
1990s were forerunners of today's Business Intelligence
Tools.
Density
or Dense
A key aspect of designing and optimizing multidimensional
database cubes is to understand which dimensions are
Dense. A Dense dimension is likely to have a high percentage
of its members populated, for every row of source data
loaded. Typical dense dimensions are time, scenario
(actual vs. budget), and accounts.
Dimension
OLAP
A Dimension is one of the analytical perspectives that
can be used to analyze the data in an OLAP cube; common
examples include products, customers, and time. Dimensions
contain one or more hierarchies, which have levels for
drilling up and drilling down in the cube.
Dimension
Relational Table
A Dimension relational table holds the analytical perspectives
used to analyze data in Fact tables; common examples
include products, customers, and time. Dimension tables
have a primary surrogate key that is used to join them
to the fact table, and hold all attributes that describe
the given dimension. There typically are more attributes
in the relational dimension table then are built into
the corresponding OLAP dimension.
The dimension tables in a star schema are intentionally
de-normalized. The level fields and the attribute fields
contain data that is duplicated in many of the records.
This normally does not add significantly to the amount
of storage space needed in the database, because the
overall size of each dimension table is very small when
compared to the size of the fact table.
Dimensionalization
The process of transforming data into a de-normalized
dimension table in a star schema design.
Drill
Down and Drill Up
The ability to move between levels of the hierarchy
when viewing data with an OLAP browser.
- Drill
Down - Changing the view of the data to a greater
level of detail.
- Drill
Up -
Changing the view of the data to a higher level of
aggregation.
Multidimensional analysis (OLAP) tools organize the
data in two primary ways: in multiple dimensions and
in hierarchies. Drilling down and drilling up allow
an analyst to move down and up the hierarchies to
see how the information at the various levels is related.
After looking at the sales totals for a store's departments,
the analyst may want to drill down to see the individual
sales for each employee in one of the departments.
Then the analyst may choose to drill up to view how
this store's total sales compare to other stores in
the same region.
DTS
(Data Transformation Services)
An ETL tool provided as a part of Microsoft SQL Server.
It provides a design environment for creating data transformation
applications.
E
Enterprise Information System/Executive Information
System (EIS)
A computer system designed to assist an organization
in making decisions. The Decision Support Systems and
Enterprise Information Systems of the 1980s and early
1990s were forerunners of today's Business Intelligence
Tools.
ETL
(Extract, Transform, and Load)
ETL refers to the process of getting data out of an
operational data store (Extract), modifying it to be
accurate, complete, and consistent (Transform), and
inserting it into the data warehouse or data mart (Load).
F
Fact Table
In a star schema, the central table that contains the
individual metrics for the given subject area is called
the Fact table. The Fact table for a sales star schema
is typically loaded from the invoice transactions. There
are two types of fields in a fact table: 1) The surrogate
foreign keys that join the fact table to the associated
dimension tables in the star schema, and 2) the metrics
that describe and value the given subject area of the
business.
The
granularity of the fact table is one of the most significant
design decisions in creating a data warehouse. The fact
table should be designed at the lowest level that users
require to see the data. One of the design challenges
is that every metric must have a value in all associated
dimension tables in the star schema; this means that
the grain must be consistent within the fact table.
G
Granularity
The level of detail of the facts stored in a data warehouse.
This is the lowest level of detail that users will be
able to access in the data warehouse.
H
Hierarchy
Organization of data into a logical tree structure.
Dimensions can have one or more hierarchies. A Time
dimension, for example, could have a Calendar hierarchy
and a Fiscal hierarchy. Hierarchies contain levels,
which organize data into a logical structure. Moving
between the levels of a hierarchy is called drilling
up and drilling down.
Hybrid
OLAP (HOLAP)
A combined use of Relational OLAP (ROLAP) and Multidimensional
OLAP (MOLAP). In HOLAP, the source data is usually stored
using a ROLAP strategy and aggregations are stored using
a MOLAP strategy. This combination usually results in
the least amount of storage space and the fastest cube
processing.
Hyper-Cube
A cube with more than three dimensions; also known as
Cube and Multidimensional Cube. A cube is an object
with three dimensions. A hyper-cube is a cube-like structure
with more than three dimensions. In the world of OLAP,
hyper-cubes are nearly always simply referred to as
cubes.
I
J
K
L
Legacy System
An operational computer system that has been around
for a long time. Sometimes organizations have several
legacy systems that have been developed at different
times by different people for a variety of purposes.
The data in these systems is usually mutually incompatible
and sometimes inaccurate. One of the biggest challenges
of the data warehousing process is to bring data out
of the various systems where it currently is located
and organize it to fit together in the data warehouse.
Leaf
Level
The lowest level in a dimensional hierarchy in an OLAP
cube. This is typically the level that data is loaded
into an OLAP cube before it is calculated.
Level
The common term used to describe each tier of a hierarchy
in a dimension. For example a simple Time dimension
could have levels for Year, Quarter, Month, and Day.
Linked
Partition
In Hyperion Essbase, a linked partition sends users
from a cell in one multidimensional database to a cell
in another database. A linked partition gives users
a different perspective on the data.
Local
Cube
A subset of an OLAP cube that has been downloaded from
the server onto someones PC to allow for analysis
while disconnected from the network.
M
MDD (Multidimensional Database)
A multidimensional database stores data in a different
fashion then the traditional two dimensional relational
databases. A multidimensional database has several dimensions
that are each made up of one or more hierarchies. While
this sounds similar to a relational star schema database,
the difference is that the multidimensional database
pre-aggregates many levels of the combinations of hierarchies
for the many intersections across all of the dimensions.
While this is not necessarily the most efficient way
to store data, it can provide ideal response time for
users interacting with the data. Instead of waiting
for an SQL statement to execute and aggregate the requested
result set, the aggregated totals can often just be
extracted and displayed from the multidimensional database.
This performance is ideal for analyzing information
on the fly.
MDX (Multidimensional Expressions)
The querying language for Microsoft OLAP cubes. MDX
has some similarities to SQL, but has many unique features.
The following query returns a cellset with the names
of the store regions on the columns, the names of product
families on the rows, and the profit displayed in the
cells:
select
[Stores].[Region].Members on columns,
[Products].[Product Family].Members on rows
from SalesCube
where ([Measures].[Profit])
Measure
A numeric value stored in a fact table and in an OLAP
cube.
Member
One of the data points for a level of a hierarchy of
a dimension. For example each of the twelve months is
typically a member of the Time dimension.
Member
Property
An attribute of a level that is available for OLAP querying.
In Microsoft Analysis Services you can create member
properties for any level. These member properties can
be referenced directly in MDX queries, and they can
also be used for creating virtual dimensions.
Metadata
Metadata is control information about the data warehouse
and the data in the warehouse. Technical metadata is
used by the data warehouse administrator to implement
and control the data movement propagation operations.
Business metadata helps warehouse users find information
in the data warehouse without knowing the underlying
implementation of the database. Also known as data that
describes the data in the warehouse.
Multidimensional
Analysis
Also Known As: OLAP (On-Line Analytical Processing)
This type of analysis allows the user to interact with
the data by dynamically changing the queried results
with convenient mouse click functions. Multidimensional
data is organized hierarchically, allowing users to
"drill down" for more detailed information,
"drill up" to see a broader, more summarized
view, and "slice and dice" to dynamically
change the combinations of dimensions that are being
viewed.
Multidimensional
Cube Another term used for Cube.
Multidimensional
Database See MDD.
Multidimensional
Online Analytical Programming (MOLAP)
An OLAP database that stores data and aggregations in
a multidimensional database structures.
N
Nonvolatile
Data that does not change. Data is stable in a data
warehouse. More data is added, but data is never removed.
This enables management to gain a consistent picture
of the business. Nonvolatility is one of the original
defining characteristics of a data warehouse.
Normalization
The process of organizing data in accordance with the
rules of a relational database.
In
a completely de-normalized database the customer name
and address information would be stored every time a
customer made a purchase.
In
a normalized database each customer's name and address
would be stored only once, in a separate table. Every
purchase record would have a reference to the customer
table to indicate which customer was involved.
A
fully normalized database is usually the most efficient
design for an On-line Transaction Processing System.
A data warehouse, with its emphasis on efficient retrieval
of data, often benefits from some intentional de-normalization.
See the discussion of the Star Schema.
O
OLAP (On-line Analytical Processing)
This type of analysis allows the user to interact with
the data by dynamically changing the queried results
with convenient mouse click functions. Multidimensional
data is organized hierarchically, allowing users to
"drill down" for more detailed information,
"drill up" to see a broader, more summarized
view, and "slice and dice" to dynamically
change the combinations of dimensions that are being
viewed.
OLTP
(On-line Transaction Processing)
This is the common description for an operational system.
It is usually a highly normalized database that handles
interactive data entry processes such as entering invoices
and journal entries.
P
PivotTable Services
The tools that provide access to Microsoft's Analysis
Services data in Microsoft Excel.
Private
Dimension
In Microsoft Analysis Services, a dimension that is
restricted in use to one particular cube. Shared (conformed)
dimensions are very useful in creating a unified data
warehousing structure. You can create a dimension once
and use it in several different cubes. Private dimensions
are useful in those situations where you want independent
cubes. If you change a private dimension, that change
only affects a single cube, whereas the change of a
shared dimension can have implications for many cubes.
Processing
In Microsoft Analysis Services, the building of cube
and dimension structures and cube aggregations. Cubes
need to be processed before they can be browsed. For
cubes, an incremental process only adds new data into
the cube, while a full process reprocesses all of the
data to the cube.
Q
R
Relational On-line Analytical Processing (ROLAP)
OLAP that stores data and aggregations in a relational
database.
Replication
The physical copying of data from one database to another.
In data warehousing replication takes place as data
is moved from the on-line transaction processing system
into the data warehouse. Replication also takes place
if one or more data marts are being populated with data
from the data warehouse. There are several software
tools that have been developed to handle replication
into a data warehouse. These tools give the ability
to transfer data out of and into a variety of database
management systems. Many of these tools also provide
data transformation and data cleansing capabilities.
- Heterogeneous
replication occurs when the source and the target
database are not the same database management system.
- Data
migration is the movement of data from one environment
to another as happens when data is brought
from a legacy system into a data warehouse.
- Bidirectional
replication is the ability to copy data in both directions
between two databases.
- In
changed data capture only the data that has been changed
since the last replication is copied.
- In
synchronization all the data stored in the database
is replicated.
Replicated
Partition
In Hyperion Essbase, a replicated partition is a copy
of a portion of the data from a source multidimensional
database that is stored in a target multidimensional
database. This provides the ability to combine data
from multiple multidimensional databases into a single
database.
S
Scale, Scalable, and Scalability
The ability of a computer system or a database to operate
efficiently with larger quantities of data. Scalability
is often discussed in situations when multiple processors
are joined together. The system scales well (or is scalable)
if doubling the number of processors also doubles the
speed at which the system performs its tasks. The extra
work involved in coordinating larger systems usually
prevents them from being fully scalable so that
going from one to two processors would increase the
total speed by less than a factor of two.
Schema
The logical organization of data in a database.
Shared
Dimension
In Microsoft Analysis Services, a dimension used by
more than one cube. In general, a dimension that is
used by more than one cube is called a conformed dimension.
Shared
Member
In Hyperion Essbase, the same member can be built into
alternate hierarchies and aggregations within the same
dimension. This provides an easy and efficient way to
build alternate rollups. The data is only stored once,
and the "shared members" are pointers to the
original data point.
Slice,
Slicer, Slicing
The limiting of a cellset to data for a single member
from a particular dimension. Slicing in MDX is similar
to filtering in a relational database. In an MDX query,
the WHERE clause is called the slicer.
Slice
and Dice
The ability to move between different combinations of
dimensions when viewing data with an OLAP browser. Multidimensional
analysis tools organize the data in two primary ways:
in multiple dimensions and in hierarchies. Slicing and
dicing refers to the ability to combine and recombine
the dimensions to see different slices of the information.
Slowly
Changing Dimensions (SCD)
A dimension that has levels or attributes that are changing
on an occasional basis. See Changing Dimensions to understand
the standard approaches to updating slowly changing
dimensions.
SQL
(Structured Query Language)
The standard language for accessing relational databases.
Snowflaking
Normalization applied to the dimension tables of a star
schema. The star schema is a very simple database design,
which clearly presents the multidimensional character
of the data and allows for rapid querying of the data
in a data warehouse. In snowflaking, some of the fields
of the dimension tables are split off into separate
tables. This achieves a higher level of normalization,
but makes the database design more complex and can reduce
the performance and ease of use for Business Intelligence
Tools.
Sparsity
and Density, Sparse and Dense
The degree to which the cells of a cube are filled with
data. One of the primary challenges of storing multidimensional
data is the degree of sparsity that is often encountered.
When many dimensions are considered with a fine grain
of detail, most of the cells will be empty.
Star
Schema (Business Definition)
A method of organizing information in a data warehouse
that allows the business information to be viewed from
many perspectives. The star is a picture of the way
the data is being stored. The basic factual information
is in the middle of the star. The points of the star
represent various perspectives from which the factual
information can be viewed.
Star
Schema (Technical Definition)
A database design that consists of a fact table and
one or more dimension tables. Each of the dimension
tables has a single field primary key which has a one-to-many
relationship with a foreign key in the fact table. The
star schema is an intentional simplification of the
database design that would be achieved by following
the standard rules of normalization. The dimension tables
are often flattened, to allow for more efficient querying
(see "snowflaking").
Structured
Query Language (SQL)
The standard language for accessing relational databases.
Summary
Tables
Tables used to store summarized or aggregated data.
Summarized fact tables can be created to enhance performance
in a ROLAP solution; however, creating many of these
can make future modifications complicated.
Synchronization
The copying of all data in a database replication.
T
Time-Balance-Average
In Hyperion Essbase, this is a tag placed on a measure
when the value displayed should show the average value
for the time period selected. This means that if someone
is viewing a Time-Balance-Average measure for quarter
one, the system will automatically average the values
for periods one, two, and three, instead of aggregating
them.
Time-Balance-First
In Hyperion Essbase, this is a tag placed on a measure
when the value displayed should show the first value
for the time period selected. This means that if someone
is viewing a Time-Balance-First measure for quarter
one, the system will automatically show the value for
period one, instead of aggregating them.
Time-Balance-Last
In Hyperion Essbase, this is a tag placed on a measure
when the value displayed should only be for the most
recent time period holding data. A typical example for
this is a balance sheet account. If someone is looking
at the balance sheet at the end of quarter one, they
should see the Accounts Receivable value for period
3, and not the aggregation of all three periods.
Time-Variant
Data
Data that is identified with a particular time period.
Time-variant is one of the original defining characteristics
of a data warehouse.
Transparent
Partition
In Hyperion Essbase, a transparent partition allows
users to access data from the multidimensional database
source as though it were stored in the multidimensional
database target. However, the data is stored at the
source, which can be in another application or in another
Analytic Services database or on another Analytic Server.
This provides the ability to combine data from multiple
multidimensional databases into a single database.
U
V
Virtual Cube
The term used in Microsoft's Analysis Services (OLAP
Services) for a cube that is created from portions of
one or more base cubes. A virtual cube is similar to
a view in a relational database. It can be used for
security purposes, giving users access to only some
of the dimensions and measures. It can also be used
to show data from separate cubes at the same time. Virtual
cubes are much more useful when you have shared dimensions
and measures that are common to all the base cubes that
are used.
Virtual
Dimension
The term used in Microsoft's Analysis Services (OLAP
Services) for a dimension that is created from one or
more member properties in another dimension.
W
X
XML (eXtensible Markup Language)
A method of sharing data between disparate data systems,
without needing a direct connection between them.
XML
for Analysis Services
An XML schema that can be used to communicate with a
Microsoft Analysis Server.
Y
Z
|