If you’ve got a load balanced Dynamics CRM on premises installation be careful how you configure the load balancer. An incorrect setting can limit the capacity of your system.
During load testing of a client’s Dynamics CRM environment we encountered a problem where the CRM web services appeared to be saturated and would no longer accept connections – logging the following error:
There are too many active security negotiations or secure conversations at the service. Please retry later.
When load testing things I expect to reach a breaking point eventually, but this was happening with very few connections (i.e. less than 50 in a few seconds).
The basic configuration was as per the diagram below.
Some custom .NET web services had been written using the CRM 2013 SDK and deployed to two load balanced web servers. The CRM itself had two web services (both with full role) behind a load balancer as well. The load balancers were Kemp LoadMaster virtual appliances.
When we generated a whole lot of requests to the custom .NET web services the errors started appearing very quickly. This results in the CRM web services no longer accepting connections, requiring an iisreset to restore normal order. By generating load against the CRM OData Web Service directly we knew we had ample capacity in the infrastructure and that the native product had no problem handling waaaaay more requests than we saw from the custom web services. After pointing the finger at the developers for not closing their connections, they showed us the code and indeed they were closing their connections.
So the next thing we checked was the load balancers. The NLB fronting the CRM server (purple above) had a setting for session persistence that disagreed with the whole setup. Changing this to SuperHTTP resolved the issue and the custom web services were then able to handle the desired number of requests. This issue may not be limited to the Kemp LoadMaster and may be present in other scenarios if you don’t have the persistence configured correctly.
This wasn’t found with ‘normal’ user based testing of the system (because it was only a couple of users generating minimal traffic), but by using automated tools to generate the load (Visual Studio 2013 Ultimate). The load which flushed out the issue was waaaaay less than would have been experienced when the system went live, so this time the exercise of load testing certainly paid for itself by identifying this issue in advance. Happy Ending.