|
As
the Ralph Kimball quote above describes; designing,
developing and automating the back-end processes required
to load the data warehouse and data marts is where the
majority of the project timeline is spent. Issues such
as dealing with inconsistencies in the data, e.g., location
A uses an item master field different then location
B, or integrating data from different systems that each
have their own customer master file, can be quite challenging.
Having gone through this process several times at many
different clients, Analytic Vision can help you expedite
this process by quickly identifying these issues and
offering best practices for dealing with the different
data issues that you will face.
As
our iterative Implementation Methodology
describes, once a specific subject area of information
has been selected (e.g., Sales Analysis or General Ledger
Analysis) and business users have been interviewed,
the next step is to interview technical personnel to
start the discovery process of where (and if) the desired
source data is stored, and what the known issues are
in gathering this information. The rest of this section
goes into detail on Analytic Visions approach
to organizing and executing this process. We believe
that a key differentiator for us is not only our ability
to design and develop optimized extraction, transformation,
and load (ETL) processes, but also our ability to automate
and audit these processes.
Identify
and Document Required Source Data
This step involves interviewing the appropriate IT staff
members to map the business requirement data elements
to the actual data source elements. While this may sound
simple, this is typically where many projects first
develop issues. It is common for requirements to be
oversimplified; for example, the interviewee may only
know the standard use for a data element, or may only
think of the typical use, as they may fail to mention
that one of the business units treats a process differently.
Every oversight or error at this stage is magnified,
as it will impact the Staging table design, ETL process
design, and ETL development if it is not discovered
prior to data validation in the data warehouse.
During
this process the business intelligence team should document
the mapping of every business requirement data element
back to its source system, collection/library, table/file,
and column/field. Most importantly each mapped element
should have a cleansing note column that identifies
any known data transformation steps or issues. These
transformation steps may identify a required translation
table, or the key fields to a description table that
links a user-defined code to its corresponding description.
You should also define whether a field is required or
optional, which will impact the ETL design.
Typically
some of the business requirement data elements will
not exist in the operational systems. The team will
then have to determine where or if the element can be
retrieved. If the data element needs to be removed from
the requirements, the business sponsor and representative
users need to be notified to set appropriate expectations.
Sometimes a missing data element may create a data entry
task, such as filling in a conversion factor or cross-reference
table. This may be another example of a task pushed
back on the business sponsor, as a business user may
be the only resource with the knowledge to load the
data. This type of issue is usually left to the business
side to determine if the data element in question is
worth the required data entry and maintenance work.
While
this section describes a few of the data issues that
may be encountered, there are always variations and
important decisions to be made at this early stage of
the project. Once all of the source data elements have
been identified, it is time to move on to the design
phase of the back-end process.
Design
Staging Tables and Required Cleansing Process
There are two philosophies on how a data warehouse staging
area should be designed and used. One party, well
call the "Inmonites" for author Bill Inmon,
believes the staging area is a permanent storage area
called an Operational Data Store (ODS) that is a normalized
designed database of all possible data that supports
the loading of aggregated data marts for user access.
The other party, well call "Kimballites"
after the author Ralph Kimball, believes the staging
area is a temporary storage area used to load the enterprise
data warehouse, which will hold all data to which users
require access. We are firm believers that the staging
area supports loading clean data into the data warehouse,
and does not need to redundantly store data that is
not very accessible to the end users (normalized databases
are not ideal for end user access).
The
staging area should hold permanent support tables, such
as conversion and cross-reference tables, and temporary
tables that hold the raw data that gets transformed
into dimension and fact tables in the data warehouse.
Analytic
Vision is adept at designing and developing optimized
processes for extracting, transforming and loading the
data into the data warehouse. We can create detailed
program specifications for your development staff, and/or
we can accelerate the development by assisting your
team hands on, and leave you with accurate documentation
on what we have developed.
Extract
Required Source Data and Load into Raw Staging Tables
The first development step is to create the raw extracts
from the various source systems. The ideal approach
is to only extract changed records, and then delete
these raw records as they are successfully processed.
This provides the flexibility to easily change the timing
of your refresh from weekly, to daily, to near real-time.
Clean
and Transform the Staging Data and Load it into the
Data Warehouse
This is where a variety of data-cleansing techniques
get utilized to optimize a clean load. Analytic Vision
has used a variety of SQL-based ETL tools and has developed
custom-coded solutions for complex situations where
performance is important. One of our strengths at Analytic
Vision is to design table-driven cleansing processes
that avoid using hard-coded entries, which provides
the client with flexibility to modify data warehouse
values without changing the ETL process.
One
of the key points of a well-designed data warehouse
is that the grain (the low level) of the data warehouse
is consistent (e.g., item, customer, day, minute, etc.).
This usually ends up requiring allocations at some point
in the development of an enterprise data warehouse solution.
A common implementation of allocations comes when someone
wants to see customer profitability. This raises a variety
of challenges as general ledger account balances need
to be allocated down to the customer and item level.
Analytic Vision has designed and developed several variations
on rules-based allocation processes customized for each
clients unique requirements.
A
quick note on Customer Profitability: many companies
want to implement customer profitability right away,
but this is what we call a second-tier subject area
in the data warehouse. Its called this because
its primary data sources are typically general ledger
and sales data. Ideally these two subject areas are
already loaded in your data warehouse (first-tier subject
areas), and they become the source data for the allocation
process to load the second-tier customer profitability
subject area.
Format
Data Warehouse Data for Loading into OLAP Databases
Another component of the ETL process that many companies
initially are not aware of is the required formatting
for loading into several of the OLAP "cube"
databases such as Hyperion Essbase.
Similar to optimizing the load of the relational data
into the data warehouse, it is important to optimize
the build and load of OLAP databases, especially multidimensional
databases, which can take a while to load and calculate.
Analytic Vision has advanced skills not only in understanding
what techniques work best for various OLAP products,
but also in understanding how to use advanced formulas
to simplify OLAP cube designs and the load process.
Automate the Process and Provide Menu-Driven
Reload Options
We believe that one of our key differentiators is our
ability to automate the entire ETL process, not only
into the data warehouse, but also into the OLAP data
marts. We see a lot of companies that still lock and
send information from Excel into products such as Essbase
in order to build the Accounts dimension and load the
data. A properly designed process should require manual
intervention only when the users request a design change
or enhancement. A new account should not cause an incomplete
load.
A
business intelligence solution has enough challenges
without having to worry if someone launched a manual
step at the right time in a process. Analytic Vision
has a lot of experience at automating all phases of
the ETL process, and implementing audit steps where
appropriate to simplify the process of verifying the
accuracy of the data for the business intelligence team.
|