Phases of Data migration: Design

Now the fun begins! :)

In previous posts, we had focused on phases of data migration, and looked at planning and analysis stages in detail. In this post, we will focus on design of the tools and methodology for data migration.

What tools do we need? We will need at a minimum three tools for

* Extraction & transformation
* Load
* Validation

The first item to complete would be data mapping from source to target system. Based on the scope identified in the planning stage, have your business analysts dig deep and clearly identify all data sources, formats/rules and map them to the target system. This mapping exercise needs to be comprehensive for e.g. simply mapping name of the customer is stored in table XYZ and in column 1 in source system to table ABC and in column 2 is not enough. Why? In order to be thorough, you need to consider the maximum length of characters in your source system and check to see if your target system can handle this.

If your target system cannot handle this length, you have some choices (extend character size, truncate some customer names etc…). Regardless of decisions, this will have to be captured in your planning assumptions, socialized and then documented in your design document.

Word of caution: Don’t make assumptions! Don’t trust anyone! Including your data experts. Have your experts prove their knowledge of the data and their expertise. The reason for this seemingly paranoid approach is justified; in a lot of cases “knowing the data” means knowing the data base construct and limited knowledge of how the business uses the data or sometimes the other way around knowing the business logic but having no clue of persistence at the database layer.

Once your data mapping is done, you will have a clear idea of any data cleanup that needs to be done. Drive your team to think in terms of numbers and impact to schedule. The numbers should indicate the number of (1) records that have to be fixed/cleaned, (2) resources required to perform this clean up (3) hours/dollars effort/cost impact.

If data cleanup is required, consider this as a separate body of work. Don’t scavenge on the design activity to fund or resource the data cleanup. If you need additional help, make sure you raise awareness/jeopardy and ask for help.

Continuing on with design, ETL (extract transform and load) utilities can be challenging, but you don’t have to reinvent the wheel. Think outside the box, in most organizations that focus on KPI’s and performance metrics, there will be well defined tools to extract data. These tools can form the skeleton for your extraction tools. Review these carefully and document, what else you might need. If you are starting from scratch, try to get the experts to document the application layer and database layer. Once you have an idea of how source system manages and stores data, you can work towards extraction. With the advent of J2EE based systems, pure database centric extractions have become exceedingly difficult. The reason is that the logic and rules for extraction exist in the application layer and may not be clearly available in the database. In some cases, I have spent hours trying to create relationship diagrams using tools and that finally managed to construct diagrams based on combining application logic with some creative thinking (hacks) into the databases.

This activity of design and building tools to support data migration are no different from the equivalent standard SDLC tasks.

The key requirements for these tools are scalability and performance. Your tools should be able to perform the tasks within a timely manner and be able to handle the data set identified in the planning stage. While going through design, build and test iterations, I would highly recommend keeping a spreadsheet to record performance.

Thoroughly analyze the sequence for data extraction, load and validation. This is the sequence you need to solidify in the test phase and execute to during go-live. This sequence is usually identified as the last step and this is a common mistake. Instead of honing the strategy, data migration leads continue to spin their wheels.

"Disclaimer: The views and opinions expressed here are my own only and in no way represent the views, positions or opinions - expressed or implied - of my employer (present and past) "
"Please post your comments - Swati Ranganathan"


Post a Comment