Phases of Data migration: Validation

This is a key phase to ensure success of the overall migration effort! Validation sounds easy but how do you go about it?

I have always setup a 3 tiered validation criteria,

• First level validation is to ensure that the records from a count perspective made it in to the target system.
• Second level validation to ensure that the key data elements made it into the target systems. The key data elements I look for are data that are essential to the running of the business. Another condition I have used is select data that is part of the object’s properties and are required fields within the application layer. For e.g. part unit of measure, cost are part of the object’s key attributes in ERP/PLM systems,
• Third level validation is to ensure the metadata for attributes and keywords that are nice to have but don’t have a significant impact to the business if they were incorrectly loaded.

I approach validation from three perspectives with a focus on

• Ensuring proper extraction from source system
• Ensuring proper data transformation into flat files (CSV, XML etc.)
• Ensuring proper load into target system

If you analyze failures or errors, you have to start by reviewing what you extracted. If you have any doubts at this layer, then the success of the overall project will be in doubt. If the data is properly extracted but incorrectly populated into a flat file, then your load will not be successful. If you have been successful in extraction and transformation and have properly tested the loads then you should have the data loaded successfully into the target system.

One key issue always pops up when it comes to validation: WHO is responsible? In most cases, the business owners point to the IT guys and IT guys point to the business owners. In order to be successful, engage both teams and work through the development of validation criteria, success criteria and identify what can be automated, validate the automation routines so that both sides are satisfied.

Automating this activity is almost a must in most cases, when you are faced with gigabytes or even terabytes of data manual lookups will not be sufficient. You could get fancy and dabble with sampling theory. In my opinion, go for 100% checking by putting technology to work!

"Disclaimer: The views and opinions expressed here are my own only and in no way represent the views, positions or opinions - expressed or implied - of my employer (present and past) "
"Please post your comments - Swati Ranganathan"


Post a Comment