Integrating systems with NServiceBus
Here at iSAMS we sometimes need to integrate with software from third parties. Recently we have been working on an integration that involves synchronizing data between iSAMS and a cloud based application. It soon became apparent that we were developing a distributed system; separate software, networks and servers all trying to work together as one system. These types of system bring with them a particular set of problems, best understood by looking at the assumptions developers new to such systems often make; the fallacies of distributed computing:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn't change
- There is one administrator
- Transport cost is zero
- The network is homogenous
With iSAMS hosted in variety of environments, many of which we have little control is was clear the we were going to have to build a system that would be tolerant of network issues; what would we do when the network was down, or was simply being slow? Even with perfect network conditions at our end what would happen if the third party application was down? With these issues in mind it was clear that synchronous communication between systems was not going to work, and we decided to build a synchronization solution built around Asynchronous Messaging. Luckily there are frameworks available to help with this and after looking at a few options we decided to use NServiceBus.
What is NServiceBus?
NServiceBus is a Service Bus, but what does that mean? A Service Bus is used to pass messages between different services. Rather than simply pass them on it offers ways to monitor and control this exchange of messages along with support for message versioning, a process manager, and load-balanced message handling.
In order to ensure delivery of messages NServiceBus builds on top of existing queueing technologies such as MSMQ or Azure Service Bus. These queues keep a copy of each message until it has reached it's destination.
NServiceBus offers two main types of Message - Events and Commands.
Event messages are used when something has happened, and the publisher of these events does not need to know where they are being received. NServiceBus keeps track of any subscribers and makes sure that messages get sent to the right places. This pattern is called Publish-Subscribe (Pub-Sub for short). Within our integration we use these messages when something has changed in iSAMS - every time you change someone's name and event will be raised with details of the change.
Command messages are used when a single service (or endpoint) may be receiving messages from one or more senders that it will need to act upon. Senders need to explicitly state where they want the message to be received. This type of message is used when we receive an event from the third-party systems notifying us of a change. In response to the event we send a command to the part of our integration that is responsible for updating data within iSAMS. While this is currently used with only one integration, it allows us to easily add extra integrations in future.
As all messages, whether an Event or a Command are sent using an underlying queue it is possible to move data between different machines as long as they can see that queue. Queue transports such as Azure Service Bus are good for this as they are cloud based but will only work if access is granted; many networks limit what can be accessed due to security concerns or bandwidth limitations. Luckily NServiceBus offers us a solution.
Where it is necessary to transfer messages between sites NServiceBus provides Gateways. These provide a durable mechanism by which messages can be sent between physically seperates sites using HTTP or HTTPS.
Gateways do not support publish subscribe - messages must be sent explicitly to a gateway endpoint, so it may be necessary to send a message that would otherwise be published as an event. In cases like this you can simply publish the event once you have received it so that local subscribers can then handle it.
When things go wrong
As the fallacies I mentioned earlier suggest, there are many things that can go wrong when communicating between systems. NServiceBus helps us with this by providing mechanisms to recover when processing a message fails. It does this by wrapping all message handling with retry logic.
In the first instance that a message can't be handled, NServiceBus tries again. By default it will attempt to process each message five times. For intermittent errors, such as a service restarting or a network briefly dropping out this can be enough, but sometimes errors can last a little longer.
Where messages aren't able to be processed within the first level of retries, NServiceBus provides another layer of retries, with delays between each attempt. By default it will wait 10 seconds, then 20, and finally 30 seconds, but this can be configured so that longer delays are possible. This kind of retry is useful when a server is restarted, or much longer network failures are seen between sites in the system. In order to prevent issues being hidden NServiceBus doesn't allow messages to be retried for longer than 24 hours.
If a message cannot be processed after all configured retries NServiceBus sends it to an error queue. By monitoring the error queue it is possible to see when serious issues have impacted on the performance of the system, and action can be taken to resolve any issues. Once this has been done it is possible to re-process all failed messages.
This has just been a quick introduction to using NServiceBus in building Distributed Systems. In future posts I will look at some of the issues that are seen when building systems with messaging, such as eventual consistency and handling duplicate messages.