System Integration – Design Options

This article is about making a choice on a particular problem while integrating multiple systems. Integrating multiple systems is a common problem that Software engineers solve when their project has multiple systems.

While integrating multiple systems from the scratch there is a small design question. Which system initiates a data transfer? Will the data source system push data into the destination system? (Or) Will the destination systems pull data from the source? (Or) To build a mediator system which takes of pulling the data from the source and pushing into the destination?

Data Push

In this way, the source data system is tied to one or more destination systems and takes the responsibility of sending the required data to the destination system(s).

Pros

  • Data sync will be real-time.
  • No unnecssary polling needed.

Cons

  • The source could be connected only to a fixed set of systems. Including additional systems could be costly based on the design.
  • By taking up the responsibility of  Data Pushing, sometimes the source system could also additionally have to take care of concerns like the quality of the data being pushed to each system.

Data Pull

In this way, the system which requires data pulls the data from the source system as needed.

Pros

  • It’s always better when someone asks and gets only whatever they need. So by this way, the system which needs the data takes the responsibility requesting and getting only what it needs.
  • Since the destination systems know better what it needs when the needs changes, the data integration logic change will be in the same system.
  • Easy to setup any number of development/QA instances, since setting up the system itself will take care of the data pull.

Cons

  • Real-time data pull requires continuous polling, thereby increasing the number of requests to the data source.

A Dedicated Mediator

In this way, we build a dedicated system which takes the responsibility of pulling data from the source system(s) and pushing the data to the destination system(s). ETL/EAI systems are examples for this.

Pros

  • The systems are very much decoupled. This provides more flexibility.
  • When the whole ecosystems involve multiple systems, there will be more code reusability since there will be only one place which manages the data push/pull for multiple systems.
  • When the data integration requires a lot of data translation/transformation this solution makes the things simple. Since the only job of the mediator is pull->transform->push

Cons

  • Since this is integrated with two or more systems, this needs rigorous testing when there is a change in any of the dependent systems.
  • For the solutions involving simple data transfer, maintaining a separate system will be an overkill.
  • Setting up multiple Development/QA environments might require a dedicated instance of the Mediator instance.

Conclusion

In my opinion, whenever there is an expectancy of growing requirements for the data integration it’s better to go with a dedicated Mediator.

When there need is simple data load, it is better to go with Data Pull.

Go with a Data Push only if you have a strong reason to go for it.

Advertisements

Author: kannan_r

Just one among million. Software Engineer by profession. A bit interested in Math and Computing. Sometimes feeling that my interests are worth to be recorded and shared. This is just an initiative for my sharing;-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s