Data Integration, simply put, is a way of pulling data from somewhere and putting it somewhere else. I must warn you that its easier said than done.
The most common uses of Data Integration are
1. Building a Central Data Repository : Suppose you have lots of databases spread across various geographies and across different platforms and the business needs to have a central database, which will contain all the data of all geographies in a uniform way.
2. Populating Databases used by Decision Support systems from Transaction Systems : You need to build a data warehouse which will contain information brought in from Transaction systems and then it will be used by BI tools for reporting and analysis
3. Metadata management : Updating Columns, adding new members in a multidimensional database etc can also come under data integration, although there are some disagreements on this.
There are many other cases where data integration tools can be used, the ones posted above are those which you will face most of the times.
Practical Issues of Data Integration
You can find various links regarding this on the net, I will be putting them here again in my words.
Format of Data : The date column in one table might be different from the date column in other. So while integrating both tables (Union) we need to make sure that the date format is uniform in target database.
formatting is perhaps the most common issue faced in integration. It can be as simple as date format or as complex as using particular rules to be written in SQL.
Mapping : A source data value say INDIA needs to be put as REGION10 in target, Source value US = REGION4 in target, AUSTRALIA=REGION3 and like wise. This will involve using look up tables which will map the source value to a particular target value.
Data Integration time, size and method : The time taken for integration, method of loads (row by row or bulk loads) and the size of data to be loaded will decide the effectiveness of your integration tool.
Scheduling : Data Integration process need to be scheduled properly so that there are no inconsistencies. eg. If you start a data integration process while the source transaction systems are still being updated, the target may or may not have the correct data.
Consistency : The data needs to be consistent across source and target at all times. Changed data capture is a very important aspect in the field of Data Integration.
The most common uses of Data Integration are
1. Building a Central Data Repository : Suppose you have lots of databases spread across various geographies and across different platforms and the business needs to have a central database, which will contain all the data of all geographies in a uniform way.
2. Populating Databases used by Decision Support systems from Transaction Systems : You need to build a data warehouse which will contain information brought in from Transaction systems and then it will be used by BI tools for reporting and analysis
3. Metadata management : Updating Columns, adding new members in a multidimensional database etc can also come under data integration, although there are some disagreements on this.
There are many other cases where data integration tools can be used, the ones posted above are those which you will face most of the times.
Practical Issues of Data Integration
You can find various links regarding this on the net, I will be putting them here again in my words.
Format of Data : The date column in one table might be different from the date column in other. So while integrating both tables (Union) we need to make sure that the date format is uniform in target database.
formatting is perhaps the most common issue faced in integration. It can be as simple as date format or as complex as using particular rules to be written in SQL.
Mapping : A source data value say INDIA needs to be put as REGION10 in target, Source value US = REGION4 in target, AUSTRALIA=REGION3 and like wise. This will involve using look up tables which will map the source value to a particular target value.
Data Integration time, size and method : The time taken for integration, method of loads (row by row or bulk loads) and the size of data to be loaded will decide the effectiveness of your integration tool.
Scheduling : Data Integration process need to be scheduled properly so that there are no inconsistencies. eg. If you start a data integration process while the source transaction systems are still being updated, the target may or may not have the correct data.
Consistency : The data needs to be consistent across source and target at all times. Changed data capture is a very important aspect in the field of Data Integration.
No comments:
Post a Comment