This article is about making a choice on a particular problem while integrating multiple systems. Integrating multiple systems is a common problem that Software engineers solve when their project has multiple systems.
While integrating multiple systems from the scratch there is a small design question. Which system initiates a data transfer? Will the data source system push data into the destination system? (Or) Will the destination systems pull data from the source? (Or) To build a mediator system which takes of pulling the data from the source and pushing into the destination?
In this way, the source data system is tied to one or more destination systems and takes the responsibility of sending the required data to the destination system(s).
- Data sync will be real-time.
- No unnecssary polling needed.
- The source could be connected only to a fixed set of systems. Including additional systems could be costly based on the design.
- By taking up the responsibility of Data Pushing, sometimes the source system could also additionally have to take care of concerns like the quality of the data being pushed to each system.
In this way, the system which requires data pulls the data from the source system as needed.
- It’s always better when someone asks and gets only whatever they need. So by this way, the system which needs the data takes the responsibility requesting and getting only what it needs.
- Since the destination systems know better what it needs when the needs changes, the data integration logic change will be in the same system.
- Easy to setup any number of development/QA instances, since setting up the system itself will take care of the data pull.
- Real-time data pull requires continuous polling, thereby increasing the number of requests to the data source.
A Dedicated Mediator
In this way, we build a dedicated system which takes the responsibility of pulling data from the source system(s) and pushing the data to the destination system(s). ETL/EAI systems are examples for this.
- The systems are very much decoupled. This provides more flexibility.
- When the whole ecosystems involve multiple systems, there will be more code reusability since there will be only one place which manages the data push/pull for multiple systems.
- When the data integration requires a lot of data translation/transformation this solution makes the things simple. Since the only job of the mediator is pull->transform->push
- Since this is integrated with two or more systems, this needs rigorous testing when there is a change in any of the dependent systems.
- For the solutions involving simple data transfer, maintaining a separate system will be an overkill.
- Setting up multiple Development/QA environments might require a dedicated instance of the Mediator instance.
In my opinion, whenever there is an expectancy of growing requirements for the data integration it’s better to go with a dedicated Mediator.
When there need is simple data load, it is better to go with Data Pull.
Go with a Data Push only if you have a strong reason to go for it.