As the Ralph Kimball quote above describes; designing, developing and automating the back-end processes required to load the data warehouse and data marts is where the majority of the project timeline is spent. Issues such as dealing with inconsistencies in the data, e.g., location A uses an item master field different then location B, or integrating data from different systems that each have their own customer master file, can be quite challenging. Having gone through this process several times at many different clients, Analytic Vision can help you expedite this process by quickly identifying these issues and offering best practices for dealing with the different data issues that you will face.

As our iterative Implementation Methodology describes, once a specific subject area of information has been selected (e.g., Sales Analysis or General Ledger Analysis) and business users have been interviewed, the next step is to interview technical personnel to start the discovery process of where (and if) the desired source data is stored, and what the known issues are in gathering this information. The rest of this section goes into detail on Analytic Vision’s approach to organizing and executing this process. We believe that a key differentiator for us is not only our ability to design and develop optimized extraction, transformation, and load (ETL) processes, but also our ability to automate and audit these processes.

Identify and Document Required Source Data
This step involves interviewing the appropriate IT staff members to map the business requirement data elements to the actual data source elements. While this may sound simple, this is typically where many projects first develop issues. It is common for requirements to be oversimplified; for example, the interviewee may only know the standard use for a data element, or may only think of the typical use, as they may fail to mention that one of the business units treats a process differently. Every oversight or error at this stage is magnified, as it will impact the Staging table design, ETL process design, and ETL development if it is not discovered prior to data validation in the data warehouse.

During this process the business intelligence team should document the mapping of every business requirement data element back to its source system, collection/library, table/file, and column/field. Most importantly each mapped element should have a cleansing note column that identifies any known data transformation steps or issues. These transformation steps may identify a required translation table, or the key fields to a description table that links a user-defined code to its corresponding description. You should also define whether a field is required or optional, which will impact the ETL design.

Typically some of the business requirement data elements will not exist in the operational systems. The team will then have to determine where or if the element can be retrieved. If the data element needs to be removed from the requirements, the business sponsor and representative users need to be notified to set appropriate expectations. Sometimes a missing data element may create a data entry task, such as filling in a conversion factor or cross-reference table. This may be another example of a task pushed back on the business sponsor, as a business user may be the only resource with the knowledge to load the data. This type of issue is usually left to the business side to determine if the data element in question is worth the required data entry and maintenance work.

While this section describes a few of the data issues that may be encountered, there are always variations and important decisions to be made at this early stage of the project. Once all of the source data elements have been identified, it is time to move on to the design phase of the back-end process.

Design Staging Tables and Required Cleansing Process
There are two philosophies on how a data warehouse staging area should be designed and used. One party, we’ll call the "Inmonites" for author Bill Inmon, believes the staging area is a permanent storage area called an Operational Data Store (ODS) that is a normalized designed database of all possible data that supports the loading of aggregated data marts for user access. The other party, we’ll call "Kimballites" after the author Ralph Kimball, believes the staging area is a temporary storage area used to load the enterprise data warehouse, which will hold all data to which users require access. We are firm believers that the staging area supports loading clean data into the data warehouse, and does not need to redundantly store data that is not very accessible to the end users (normalized databases are not ideal for end user access).

The staging area should hold permanent support tables, such as conversion and cross-reference tables, and temporary tables that hold the raw data that gets transformed into dimension and fact tables in the data warehouse.

Analytic Vision is adept at designing and developing optimized processes for extracting, transforming and loading the data into the data warehouse. We can create detailed program specifications for your development staff, and/or we can accelerate the development by assisting your team hands on, and leave you with accurate documentation on what we have developed.

Extract Required Source Data and Load into Raw Staging Tables
The first development step is to create the raw extracts from the various source systems. The ideal approach is to only extract changed records, and then delete these raw records as they are successfully processed. This provides the flexibility to easily change the timing of your refresh from weekly, to daily, to near real-time.

Clean and Transform the Staging Data and Load it into the Data Warehouse
This is where a variety of data-cleansing techniques get utilized to optimize a clean load. Analytic Vision has used a variety of SQL-based ETL tools and has developed custom-coded solutions for complex situations where performance is important. One of our strengths at Analytic Vision is to design table-driven cleansing processes that avoid using hard-coded entries, which provides the client with flexibility to modify data warehouse values without changing the ETL process.

One of the key points of a well-designed data warehouse is that the grain (the low level) of the data warehouse is consistent (e.g., item, customer, day, minute, etc.). This usually ends up requiring allocations at some point in the development of an enterprise data warehouse solution. A common implementation of allocations comes when someone wants to see customer profitability. This raises a variety of challenges as general ledger account balances need to be allocated down to the customer and item level. Analytic Vision has designed and developed several variations on rules-based allocation processes customized for each client’s unique requirements.

A quick note on Customer Profitability: many companies want to implement customer profitability right away, but this is what we call a second-tier subject area in the data warehouse. It’s called this because its primary data sources are typically general ledger and sales data. Ideally these two subject areas are already loaded in your data warehouse (first-tier subject areas), and they become the source data for the allocation process to load the second-tier customer profitability subject area.

Format Data Warehouse Data for Loading into OLAP Databases
Another component of the ETL process that many companies initially are not aware of is the required formatting for loading into several of the OLAP "cube" databases such as Hyperion Essbase. Similar to optimizing the load of the relational data into the data warehouse, it is important to optimize the build and load of OLAP databases, especially multidimensional databases, which can take a while to load and calculate. Analytic Vision has advanced skills not only in understanding what techniques work best for various OLAP products, but also in understanding how to use advanced formulas to simplify OLAP cube designs and the load process.


Automate the Process and Provide Menu-Driven Reload Options
We believe that one of our key differentiators is our ability to automate the entire ETL process, not only into the data warehouse, but also into the OLAP data marts. We see a lot of companies that still lock and send information from Excel into products such as Essbase in order to build the Accounts dimension and load the data. A properly designed process should require manual intervention only when the users request a design change or enhancement. A new account should not cause an incomplete load.

A business intelligence solution has enough challenges without having to worry if someone launched a manual step at the right time in a process. Analytic Vision has a lot of experience at automating all phases of the ETL process, and implementing audit steps where appropriate to simplify the process of verifying the accuracy of the data for the business intelligence team.

  Home > Services > Back End Automation