Friday, 29 July 2011

Message throughput using the NServiceBus Distributor & MSMQ

My team is currently working on a project where we are using messaging to perform some business function in reaction to operations which happen in an existing system. The specific operation which is giving us some concerns is the valuation of a clients investment policy or entire portfolio. On the face of it this seems simple enough and the mechanics are fairly trivial however these valuations can happen randomly when advisers or clients choose to get the latest value or they can happen in large batches in cases where product providers provide us with all the latest values for all the policies relating to many thousands of clients. So we could end up with anything from 10k-150k messages at any one time landing in the queue.

To test this we dumped 20k messages in a queue and using one NSB endpoint (1 thread, MSMQ transport) it was giving us around 300 - 350 messages a second and took around 1 minute to complete on a 4 core dev box using a mid market SSD disk. This wasn’t in itself too bad but the receiving endpoint was doing nothing at this point so this was clearly going to get worse when it actually had to do something.

So we wanted to see what the scale profile would look like if we threw more threads and machines at it. Firstly increasing the thread count helped but not massively. We then thought we would add the distributor into the mix to see how much we could squeeze out of this setup. This would give an idea of how we could scale this system in production and charting the gains in throughput versus numbers of machines and threads would help us visualize what we were seeing.

We added the distributor and 4 worker nodes and reran the 20k message scenario. Strangely we noticed the throughput drop to around 30-40 messages per second. Not promising! We also noticed heavy disk IO as you would expect from MSMQ so we decided to distribute the workers to other machines to relieve the disk contention and free up the CPU to get a better understanding of where the bottleneck was. Strangely again we saw no marked improvement on the previous run.We added more and more workers with no real affect. Watching the MSMQ performance counters we could clearly see it was the distributor which was holding things up and couldn’t seem to process more than 30-40 messages a second. The distributor was working as a bottleneck and not a load balancer, something is wrong!

After reading many articles on MSMQ performance (Ayende blogged about this here and here awhile ago) it dawned on me that we were never going to get any where near where I wanted to be using MSMQ as a transport. It seems to be down to MSMQ transactions which is unfortunate.

Some people however were reporting ridiculous throughput in the many thousands of messages a second however, where details were given the hardware used was immense and costly. I would love to know how to achieve better throughput using MSMQ without breaking the bank, does anyone have any ideas?

We eventually moved over to using SqlServer Service Broker as the transport (from NServiceBus Contrib ) for this queue and achieved much more throughput (around 500-700 messages per second) than with MSMQ and the distributor. We eventually modified the SSB Transport to use conversation groups and batched the messages at 1k which is currently giving us many thousands of messages a second which is looking promising. We are still currently playing with this to see where it takes us so I will leave the details of this to a later post.

Does anyone have any tips or experiences with NSB or MSMQ or other transports which they could share as I’m really interested in how other people have dealt with or are dealing with message throughput in their systems?

Wednesday, 27 July 2011

SQLServer vs MongoDB vs RavenDB write performance

My team are currently working on a project where we will be using MongoDB as the storage engine. As part of some initial testing and evaluations we tested the write performance of RavenDB and MongoDB and compared the results which we found to be very interesting.

But before I share them I would stress that these are not benchmarks you should use to base your technology choices on necessarily, they are just findings which we found interesting, technology choices should be based on more than just performance…

We  currently have a product based on SQLServer which consumes financial data from a feed. This feed is an xml file around 400MB which we receive daily and contains fund data consisting of around 45k funds. We parse this file and construct and object model which looks something like this:

ClassDiagram1

SQLServer

In SQL each one of these objects largely relates to a table as you would expect. Our importer tool will parse the file, hydrate this object graph and will save this structure for each fund in the feed. This currently takes around 25 minutes. I will add to this that there are a few other things this process does so 25 minutes is not a true reflection of the time taken. This is also profiled on our productions servers which are beefy 8 (4 core) CPU 32 GB Ram servers on a highly tuned SAN.

RavenDB

We the tested this same process using RavenDB (single instance) on a development machine with 1 CPU and 4 cores with 8 GB RAM on a mid market SSD disk. We modelled this entire graph as a single document and saved them in batches of 200 which we found to be the optimum batch size after many runs. The time taken to save these documents was 2 minutes 37 seconds…Impressive!

MongoDB

We then modified this test to use MongoDB (single instance) and ran it on the same development machine. We turned SafeMode on and ensured we were waiting for each document write to commit. The time taken to save these documents was 14 seconds….Wow!

Conclusion

Raven and Mongo are very different animals and have different ways of storing, and retrieving documents which need to be understood before deciding on their applicability.

I did not intend to discuss the differences between these storage engines as they are vast and interesting but could not be covered sufficiently in a single post. To understand them I suggest you do your reading, Mongo has some great documentation online, Raven has some too although not as comprehensive.

One thing is for sure though Mongo has blazingly fast write performance which is vastly superior in orders of magnitude to RavenDB and indeed SQLServer (at least in this use case anyway).

Friday, 15 April 2011

Some Tips on Using Microsoft.Web.Administration

The Microsoft.Web.Administration assembly provides some really helpful classes allowing you to manage IIS through code. This is especially useful for automated deployments and configuration scenarios. I am going to make the assumption you have some knowledge and or experience with using these classes as this post is more of a record for myself because I feel I have learnt some of these lessons at least twice in the last few years.

redirection.config permissions

Remember that the code using Microsoft.Web.Administration must have the correct permission to IIS’s redirection.config file in order to do anything useful. You can find this file in C:\windows\system32\inetsrv\config folder.

This is a fairly easy one to remember as it is fairly well documented online and the exception messages are pretty helpful and descriptive but I always seem to trip over it.

Turn Off UAC

User Account Control on Vista or Win7 when on will allow you read access to the IIS metabase but it will throw a horrible COM access denied exception if you try to change anything. Ensure that you turn it off and reboot your machine (it may not ask you to do this but you will have to before your code will work).

Changes are Asynchronous

When you make changes to the IIS metabase like creating a new website or performing operations like stopping or starting a website, keep in mind that these operations are performed through DCOM asynchronously and therefore you cannot make the assumption that something you have just done will take effect immediately. Quite often you will need to do some waiting: A trivial solution to this problem would be to spin wait if it is appropriate to your scenario.

ServerManager serverManager = new ServerManager();
 
var site = (from s in serverManager.Sites
            where s.Name == websiteName
            select s).SingleOrDefault();
 
site = serverManager.Sites.Add(websiteName, physicalPath, port);
site.Applications[0].ApplicationPoolName = appPoolName;
serverManager.CommitChanges();
//wait till the site is comfirmed as created
SpinWait.SpinUntil(() => new ServerManager().Sites.Any(site => site.Name == webSiteName), 60);
 
//do something with the new site like start it...
 
 

 


Starting and Stopping Websites


The start and stop API is a little clunky but again asynchronous and you will have to wait for a status change before you can assume the operation has completed.


Consider the following start stop toggle method as an example of how to work with the API.



public bool StartStop(string webSiteName, bool start, int totalWait)
{
    int specifiedWait = totalWait;
    ServerManager serverManager = new ServerManager();
 
    var site = (from s in serverManager.Sites
                where s.Name == webSiteName
                select s).Single();
 
    if (start)
    {
        if (site.State == ObjectState.Started) return true;
 
        site.Start();
        while (site.State != ObjectState.Started && totalWait > 0)
        {
            Thread.Sleep(1000);
            totalWait--;
        }
 
        return site.State == ObjectState.Started;
    }
 
    if (site.State == ObjectState.Stopped) return true;
 
    site.Stop();
    while (site.State != ObjectState.Stopped && totalWait > 0)
    {
        Thread.Sleep(1000);
        totalWait--;
    }
    return site.State == ObjectState.Stopped;
}


I have used Thread.Sleep here as an alternative to the spin wait seen before. You will have to provide a timeout for this operation if you need to wait on the state change to ensure you don’t block for ever.


That’s it for now…if you have any experiences that are interesting please comment as I would love to hear about them.

Wednesday, 2 February 2011

RedGate renege on commitment to keep .Net Reflector free

Today RedGate sent an open letter to the .Net community announcing they are going to start charging for the next version of the .Net reflector tool. For those of you who have never heard of it or never used it (I doubt there are many), .Net Reflector is a fantastic tool which can reflect/reverse engineer code from .net assemblies allowing you to browse codebases with ease without actually having the source to hand. I have used Reflector daily and over the years it has become a core part of my tooling so I am saddened by this move…but not surprised.

No I’m actually quite angry really, I knew this would happen, it was inevitable but at the same time I really hoped it wouldn’t. Yes $35 is not much for a perpetual license and most us will fork out for it (unless there is better or cheaper alternative..JetBrains? I’d rather pay for that just just to spite them). What really angers me though is that they have time bombed version 6 which will simply stop working May 30th 2011. So not only are they charging moving forward but they are taking the current version away from us too. All of this after reassuring the community 2 years ago, when they acquired it off Ludz Roeder, that they would keep it free for the community forever.

The reasons CEO Simon Galbraith (video) gives for this decision is that they could not make the intended commercial model work and that they now need to charge for the product to keep it alive and current. And he says, wait for it, he never said “promise”, wow, well that makes it all ok then!. He could just as well have said he was crossing his fingers behind his back at the time and therefore what ever he said doesn’t count… liar liar pants on fire…really, what do you take us for?

If you can’t make it work Simon, I say give it back to the community and let us maintain it, at least from version 6 and you can do what you like with version 7. Or at least remove the time bomb from version 6 and let us continue to use it as we are now…there is something really wrong about giving something for free (and saying its free) only to take it away later…and try to force us to pay for an upgrade. Read the interview on simple-talk with James Moore general manager at the time back when they acquired reflector in 2008…my favourite part is:

We accept the fact that there will be scepticism, but we can point to a good track record of support for the community

Eat your words…Poor decision RedGate!! I hope you feel the backlash you deserve.

Make your feelings known on the RedGate discussion forum here.

Tuesday, 1 February 2011

MvcUnit: The Code

A a month or so ago I wrote about an MVC testing framework I wrote and use at work. It is a lightweight framework which uses a fluent interface DSL with a BDD Given, When, Then style syntax. I have finally uploaded the code so please have a look and give it a spin. At the moment it tests the majority of standard use scenarios for controllers (I will add more as and when I come across the need, feel free to add your own) and routes.

You can get the code here…let me know what you think.

Hosting NServiceBus Endpoints in IIS and AppFabric: The Code

A few weeks ago I wrote about a hosting idea I was experimenting with. I promised to post a link to the code but have unfortunately been a bit busy of late. Never the less you can get the code here.

NOTE: Please remember this is a spike and experimental…I probably wouldn’t use this in production just yet. That said I haven’t seen anything that would suggest it was unstable or unsafe either.

There is a TODO list in the solution which was a sort of brain dump of what I thought would be interesting to implement if/when I take this further if you are interested.

Have a play and let me know what you think…

Thursday, 6 January 2011

Hosting NServiceBus Endpoints in IIS and AppFabric

I have been experimenting with a slightly different hosting model for our NSB endpoints. I want to be able to do things like xcopy deploy or even use WebDeploy to push out new versions of our endpoints in much the same way we push out upgrades to our web applications. Sure the existing model using TopShelf is great but I felt we could do more…and using TopShelf’s shelving as an inspiration I decided to experiment to see what the challenges would be.

First off let me say that this is really only feasible if you are running IIS 7.5 Windows Server 2008 R2 or Windows 7 with the AppFabric install.This is because we can now manage the application pools in ways we couldn’t before such as set the startMode to be AlwaysRunning which means it won’t recycle like it used to, in fact there are a bunch of other settings you can manage with regards to recycling which are advantageous. Brian Ritchie has a great blog post on setting this up so I won’t go into too much detail about here suffice to say we can keep the app pool running which is what we want.

So I set some initial goals for this experiment and they were:

  • I want to be able to host an existing NSB endpoint / project without ANY necessary modifications to it.
  • I want to be able to run multiple app domains side by side.
  • I want to be able to manage these app domains (start/stop) from a web page
  • I want to be able to see the logging you would normally see from the console window (this is optional but I thought it may be fun)
  • All configuration should be by convention with options to configure discreetly later if need be.
  • I want to be able to simply xcopy deploy endpoints

Getting started then I decided that all AppDomains would be hosted from a specific folder wherein each AppDomain would have its own subfolder and the name of that sub folder would be the identifier for our AppDomain and endpoint. This just keeps it simple although I may need to revisit this as there may be some limitations to this later (such as hosting the same endpoint twice; think primary/secondary or active/passive to facilitate HA style upgrades). Anyway I standardised on using the folder name as the convention for other things too such as the config file name and resolution etc.

I am going to post the code for this whole solution in a subsequent post so will show just a few snippets here and there. The main purpose of this post is to explain what I’m doing conceptually and see if anyone see’s any value or flaws in the approach.

image

The code project looks something like the picture on the right at the moment. I have, starting top to bottom, two endpoint projects (I called them plugins because I initially felt I would follow a plugin model, so don’t let this confuse you they are straight forward NSB endpoints, in fact the code is copied directly from the PubSub sample)

Then I have Hosting.Web project which is a simple ASP.NET MVC project which will run in the the parent hosting AppDomain. Finally there is a the NSeviceBus.Host.Web which manages the the AppDomains and wraps the NSB GenericHost and it’s assembly scanning capabilities.

The important thing to remember here is that those first two sample endpoint projects will run as is within the NServiceBus.Host in the normal fashion i.e. setting them to start up through an external application (NServiceBus.Host.exe) as per the documentation. We will of course be running them in our asp.net worker process.

So I compiled the solution and put the binaries of the sample NSB endpoints in the configured plugin directory. Of course I could have a post build task do this but this is just an experiment for now and I wouldn’t envisage doing this on dev machines anyway, it’s more of a deployment setup.

So the web.config in the Hosting.Web project looks like this:

image

The collapsed sections are standard out of the box stuff. The only settings we have right now is our pluginDirectory, arguments and whether to start them on startup. The arguments would be those you would normally pass to the NserviceBus.Host.exe i.e the Profile etc.

The folder structure in the ‘pluginDirectory’ looks like this:

image

The assumptions I now make, based on the conventions I’m using and mentioned earlier, are that the endpoint will reside in a dll named after the folder (although this isn’t necessary as GenericHost will scan anyway) and therefore the config file will be named similarly i.e Hosting.SamplePlugin.dll.config which you will see in the pic above.

So lets talk about the solution then, and how it works. When I startup the web project I have hooked into the AppInitialize method which allows me to start scanning for endpoints to load.

public static void AppInitialize()
{
   WebHost.Instance.LoadPlugins();
}




When I find a subfolder in the ‘pluginsDirectory’ I copy the NServiceBus.Host.Web.dll into that folder (if it isn’t there already). I then create an AppDomain (setting up all the necessary details such as BaseDirectory and ConfigurationFile etc) for each of the folders/endpoints and then create and unwrap an instance of the PluginDomainInitializer type which will give me a proxy, on which I will call Run() passing the ‘arguments’ from the config mentioned earlier. This will probably make more sense in the context of the rest of the code but here is a snippet anyway.



var hostedDomain = new HostedDomain(config);
 
hostedDomain.CreateAppDomain(permissions);
 
hostedDomain.Run(Arguments);



From here the PluginDomainInitializer is largely just wrapping an instance of GenericHost which will do it’s typical assembly scanning and NSB configuration.

The result is I have my two sample endpoints running in two separate AppDomains fully initialized and running without having changed anything within them code or configuration wise.


I wacked up a UI and a few simple features for fun and this is what it looks like running in the browser:


image


Notice I can start and stop the AppDomains, and even get real time logging info using a bit of jQuery, ajax and a Unix style tail file stream reader. The I added some jQuery and css to get a ‘ColouredConsoleAppender’ style ala log4net.


There is so much more I can do with this like online config file view/edit, file upload, REST API’s for build task based deployment, notifications and monitoring but like I said this was just a few hours of experimenting and I would greatly appreciate some feedback on the idea, both positive and negative.


I will clean up and post all the code in a subsequent blog post and on my github repository so if you wanna play with it you can.


UPDATE: you can get the code here.