Jul 28 2009

Google Protocol Buffers for .NET

Category: Uncategorizedzvolkov @ 9:13 am

Protocol Buffers is Google’s standard for binary serialization of objects.The name of course is very misleading, it denotes a singular concept, not a plurality, and it has nothing to do with buffering the protocols or protocolling the buffers, as the name may suggest.

PB is comprised of

  1. a cross-platform-compatible wire format for binary serialization of objects,
  2. a declarative language (known as Proto) used to define your domain-specific structures (objects), and,
  3. a cross-compiler that can turn the above declarative specification into a program in an imperative language that would implement the parser/serializer

This sounds complex, but what you actually get is a tool that takes .proto file and spits out a source code file that you can use to read/write your objects. Most .NET developers should be familiar with this paradigm back from xsd.exe and wsdl.exe days.

Here’s how typical .proto file looks like (from this Overview page)

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

PB is used in situations when you want to go binary for super speed and small size but don’t want to loose the extensibility/compatibility that XML used to allow. With PB you can define new versions of the same structure with extra fields added and have past versions of the parser happily ignore the new fields. Just remember that Protocol Buffers wire format is NOT human readable unlike XML or JSON, although you can produce a human readable dump of what would go onto the wire, e.g. for debugging purposes.

While native PB compiler originally implemented by Google could only generate parsers / serializers in C++, Java or Python, there are now 3 .NET ports:

  • dotnet-protobufs by Jon Skeet. This is an implementation of the Proto compiler in .NET that can generate native .NET serializers / parsers. Ayende uses this one in his NH Profiler for both inter-process communication and for saving application state on disk. This version is very close to the original Google PB spirit, in the sense that you have to explicitly call methods to serialize/deserialize.
  • protobuf-net by Marc Gravell. To avoid writing yet another compiler, Marc has chosen an approach that requires you to decorate your .NET objects with attributes specifying serialization details. Ayende has criticized this approach as sacrificing compatibility / standard-compliance for the sake of being more developer friendly. On the bright side, it does fit in very nicely with WCF, allowing you to have WCF services speaking to each other in PB language.
  • Proto# – another alternative, not sure how good or bad this one is.

While Protocol Buffers seems to be just yet another format, it has a few distinct advantages. First, like I said above, it is binary and extensible at the same time. Second, it is binary and cross-platform compatible at the same time, and third, it is backed by Google, which people have a tendency to blindly follow like.


Jul 24 2009

Migrating from Spring Framework.NET to Castle Windsor IoC

Category: Uncategorizedzvolkov @ 11:10 am

In case you didn’t know, Spring Framework was the first publicly available IoC container, originally designed by a Java author and programmer Rod Johnson back in 2002. Even before that, there was Dependency Inversion Principle, invented by Uncle Bob in 1994. For history-inclined geeks, this IoC history article on PicoContainer site goes into great details on who and when contributed to this fantastic idea.

I’ve been a happy user of Spring Framework’s .NET port for a while. I loved its non-invasive philosophy and was very happy with its rich feature set. As a matter of fact, features are not the reason for my migration to Castle Windsor. My concern is the community around these two projects, with Castle crowd having grown dramatically since 2004/2005 while Spring.NET arguably remaining Mark Pollack’s sickly baby. If this dynamics persists, and I don’t see any reasons to think otherwise, the .NET branch of Spring project will be shut down in a few years, and by that time I want to be completely off the hook.

In this post I will summarize my notes as I migrate my latest project from Spring Framework.NET to Castle Windsor IoC. (Brief synopsis for those who won’t read entire article: Windsor turned out to be surprisingly similar to Spring, but much simpler, with easier syntaxis and very minimalistic features. At the bottom of the post I list advanced facilities I use in Spring that I still don’t know how to port to Windsor.)

First of all, you want to remove the reference to Spring.Core.dll and add four references instead (I wonder why they couldn’t ILMerge them into one):

  • Castle.Core.dll
  • Castle.MicroKernel.dll
  • Castle.Windsor.dll
  • Castle.DynamicProxy.dll

Next, in your application startup code, initialize Windsor instead of Spring:

IWindsorContainer container = new WindsorContainer(new XmlInterpreter(new ConfigResource("castle")));

This assumes you were using resource tag to reference your Spring.Config from your App.Config. If you were loading your Spring.Config file directly from a well known location, Windsor supports this option as well, just use an overload of XmlInterpreter that takes file name.

Next, you’ll need to change your App.Config to point to Windsor config files, instead of Spring:

<configSections>
    <section name="castle" type="Castle.Windsor.Configuration.AppDomain.CastleSectionHandler, Castle.Windsor" />
</configSections>

<castle>
  <include uri="file://Windsor.config">
</castle>

Again, this is a direct port from my way of bootstraping Spring. For other options, check out Windsor documentation for the include tag.

This done, let’s look inside Windsor config file and see how it is different from Spring.config. In Windsor, what was called object in Spring.NET (or bean in the original Spring), is now called component. Note how parameters are specified using a better-looking parameters tag and how neatly the object references (now called service lookups) are expressed using the special ${} syntax:

<configuration>
    <components>
        <component id="componentA" type="zvolkov.migrateToWindsor.componentA, zvolkov.migrateToWindsor">
        </component>
        <component id="componentB" type="zvolkov.migrateToWindsor.componentB, zvolkov.migrateToWindsor">
            <parameters>
                <myComponentA>${componentA}</notifier>
                <otherProperty>bloody-blah</otherProperty>
                <anArrayProperty>
                    <array>
                        <item>XXX</item>
                        <item>YYY</item>
                    </array>
                </anArrayProperty>
            </parameters>
        </component>
    </components>
</configuration>

You see how much better this is than Spring’s <property name=”myComponentA” ref=”componentA”/>? Looks so much nicer! Also cool is that parameters tag can be used for both property and constructor injection. One less syntax to remember!

If your “bloody-blah” is a setting used by multiple components, you can promote it to a global property like so, note the #{} syntax:

<properties>
    <bloody>bloody-blah</bloody>
</properties>

<component id="componentB" type="zvolkov.migrateToWindsor.componentB, zvolkov.migrateToWindsor">
    <parameters>
        <otherProperty>#{bloody}</otherProperty>
    </parameters>
</component>

Other useful but scary feature is auto-wiring: You don’t have to specify references to objects! If you don’t, Windsor will auto-wire your property to an object matching the property type. So you don’t have to use service lookups unless your container has more than one object of the same type (and even then Windsor will default to the top one object in the container). I’m sure Windsor veterans consider this a power feature but IMHO it may be very dangerous especially if you start with one implementation of an interface and add more later. As a matter of fact, Windsor’s auto-wiring is so advanced that it even supports generic specialization, a feature best illustrated by Ayende in his historical MSDN article.

Coming back to basics, passing arrays, lists and dictionaries is pretty much same as in Spring. Here’s the syntax you need to use for those. The lifestyles of objects also same story: by default all objects are singletons and you can change that by specifying lifestyle attribute on the component tag.

To summarize, my first impression is pretty good. To the end user Windsor looks very simple, with no unneeded features, easy to read and write syntax, and super-lazy auto-wiring. In a next post I’ll try to cover more advanced facilities that I will need to migrate from Spring as well:

  • NHibernate configuration,
  • NHibernate Session management and (most importantly) nested Transaction management,
  • Quartz.NET integration,
  • WCF integration

I believe this post covers the first steps you need to take and provides enough information to drive the fear of the unknown away. So stop procrastinating and move to Windsor before it’s too late. This link to Windsor documentation will keep you going.


Jul 20 2009

What’s new in NHibernate 2.1

Category: Uncategorizedzvolkov @ 6:19 pm

As soon as Fabio said NHibernate 2.1 has been released I rushed to see what’s new. Here’s what I found so far:

For a full list of changes, see NH 2.1 release notes


Jul 19 2009

MSBuild extensions

Category: Uncategorizedzvolkov @ 8:45 am

Even though MSBuild comes with 40 or so built-in tasks it lacks many essential features which, thanks to community, are available from three major extension projects:

  • MSBuild Extension Pack – actively maintained, this extension provides over 280 tasks
  • MSBuild Community Tasks Project – not maintained since 2007, this set of ~90 tasks still has a few unique tasks, namely the flat-file-based Version task
  • SDC Tasks Library – not maintained since Aug 2008, this extension have been absorbed into MSBuild Extension Pack. If there’s something you can’t find in MSBuild Extension Pack, check this one out, with its portfolio of 300+ tasks chances are, it may have what you need.

Examples of problems one can solve with the help of extension tasks:

  • FTP upload/download
  • Working with Subversion and (god forbid) SourceSafe repositories
  • Running NUnit tests
  • Generating AssemblyInfo files
  • Auto-incrementing Build numbers, Version numbers, etc
  • Build CHM documentation
  • Execute SQL
  • Perform find and replace operations on files
  • Send alerts by email or Twitter


Jul 17 2009

What I want for Christmass

Category: Uncategorizedzvolkov @ 12:04 pm


Jul 16 2009

MSBuild crash course Part 1

Category: Uncategorizedzvolkov @ 6:29 pm

Much water has flown away since my first post on MsBuild and I picked up a book by MsPress called “Inside the MSBuild“. In this post I will summarize what I see in this book, as I read it. The result should be a crash-course on MSBuild that you, dear reader, should hopefully find useful.

Empty MSBuild file:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
</Project>

to run it, launch msbuild ourbuildscript.proj from command line.

Properties:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <PropertyGroup>
        <NameOfProperty>ValueOfProperty</NameOfProperty>
        <AnotherProperty>Value1</AnotherProperty>
        <YetAnotherProperty>Value1</YetAnotherProperty>
    </PropertyGroup>
    <PropertyGroup>
        <YetAnotherProperty>Value2</YetAnotherProperty>
    </PropertyGroup>
</Project>

This defines 3 properties. Since YetAnotherProperty is specified twice, its second value will overwrite the first. This is because all properties are evaluated sequentially.

Targets:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" DefaultTargets="Foo">
    <Target Name="Blah" DependsOnTargets="Foo">
        <Message Text="String that will be printed to log file"/>
        <Copy SourceFiles="c:\test.txt" DestinationFolder="d:\" />
    </Target>
    <Target Name="Foo">
        <Message Text="Hello $(YetAnotherProperty) !!!"/>
    </Target>
    <PropertyGroup>
        <YetAnotherProperty>Value2</YetAnotherProperty>
    </PropertyGroup>
</Project>

The default target in our build script is Foo and that’s what msbuild will execute unless you explicitely specify which target to run: msbuild ourbuildscript.proj /t:Blah.
In this case since Blah depends on Foo, msbuild will first execute Foo and then Blah. Foo will print “Hello Value2 !!!”.

Conditions:

<PropertyGroup>
    <SomeProperty Condition="'$(AnotherProperty) == '1'">2</SomeProperty>
</PropertyGroup>

SomeProperty will get value of 2 only if AnotherProperty is 1. This is not the case unless something sets AnotherProperty. For example via command line parameter: msbuild ourbuildscript.proj /p:AnotherProperty=1.

Items are arrays of objects:

<ItemGroup>
    <SomeArray Include="FirstElementOfSomeArray" />
    <SomeArray Include="SecondElement" >
        <PropertyOne>X</PropertyOne>
        <PropertyTwo>Y</PropertyOne>
    </SomeArray>
</ItemGroup>

Just like Properties, items can be arguments to tasks (note how array references use @ instead of $):

<Target Name="Blah">
    <Message Text="@(SomeArray)"/>
</Target>

The above prints “FirstElementOfSomeArray;SecondElement” (this is how array is converted to String, which is what Message.Text takes)

Items have special support for files. This defines two arrays and fills them with actual file names from C: drive:

<ItemGroup>
    <ArrayOfFiles Include="C:\*.cs" />
    <ArrayOfFiles Include="C:\*.vb" />
    <AnotherArray Include="C:\Temp\**\*.*" />
</ItemGroup>

The list of files can now be used as argument for Copy task:

<Target Name="Blah">
    <Copy SourceFiles="@(ArrayOfFiles)" DestinationFolder="d:\" />
</Target>

Batching

To execute a task once for each item in the array use special batching syntaxis (the % and the .Identity):

<Target Name="Blah">
    <Message Text="%(SomeArray.Identity)"/>
</Target>

To execute entire target once for each item in the array use Outputs attribute like so:

<Target Name="Blah" Outputs="%(ArrayOfFiles.Identity)">
    <Message Text="File @(ArrayOfFiles) was created on @(ArrayOfFiles->'%(CreatedTime)' )"/>
</Target>

The above also shows how to access properties of items (ArrayOfFiles.CreatedTime).
The expression in %() does not have to be .Identity, it can be any other property of the Item, in this case msbuild will group all elements of the array by the property and process each group as a batch. You can even group by multiple properties if you specify multiple %() expressions as arguments to a task or a target.

Importing

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <Import Project="anotherbuildscript.proj" />
</Project>

Sequence of events during build file processing:

  1. Set built-in properties and load environment variables into properties
  2. Read the build script, evaluating properties and items and loading imported files recursively. Only static PropertyGroups/ItemGroups (at the root level of each file) are evaluated at this time.
  3. Execute targets, either the default one, or the one specified on command line, recursively executing its dependencies
  4. For each target, execute the steps inside the target sequentially, evaluating target’s properties and items

This concludes Part 1. To be continued.


Jul 10 2009

.NET 4.0 and better ways to wait for queued threads to complete

Category: Uncategorizedzvolkov @ 9:19 am

UPDATE: Microsoft has published a great article on this topic called Patterns of Parallel Programming, be sure to check it out.

Expanding my earlier post on this very topic, I decided to convert my Stack Overflow answer into a full-blown post.

As it often happens in software development industry, our ability to solve problems is often limited by our choice of paradigms. Good-old familiar tools lock us down to ancient paradigms, creating the infamous hammer-and-nail antipattern. This was evident to me with my own attitude towards LINQ: being familiar with LINQ and occasionally using it in my projects for over a year, only recently did I realize how many of our usual FOR and FOREACH constructs can be replaced with profoundly more elegant and concise LINQ queries.

Similar, and again, LINQ-inspired, was my discovery of functional programming paradigms: map, fold, and filter. The thought that any possible operation on single set of items can be expressed via these three, and more importantly, how following functional programming conventions can make my code more standard and therefore easier to understand, was an important step in my personal devolution. Long story short, viva la immutability!

Now, this post is not about those important discoveries. It’s about more mundane matters, specifically, parallelization. But enough ramblings, my faithful reader, and let’s hope you can see how the point I was making with the above examples relates to the rest of the post.

For one reason or another, people often want to execute stuff in parallel. Leaving aside why this may or may not be a good idea, let’s focus on how they do it. Most people know there’s such magical thing as the Thread Pool and blindly feed it everything but the kitchen sink. Let’s assume we have a bunch of items, say strings, that we need to process, preferrably in parallel.

Usual approach then would be to do this:

foreach (string someString in arrayStrings){
    ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}

public void DoSomething(object data)
{
    ... do some processing...
}

But how do I wait for all items to complete before I can proceed? Let’s add a counter variable, start at the number of items, and wait until it’s down to zero:

this.counter = arrayStrings.Length; // Global variable

foreach (string someString in arrayStrings) {
    ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}

while (this.counter > 0) Thread.Sleep(100);

...proceed with the rest of your logic...

public void DoSomething(object data)
{
    ... do some processing...

    this.counter--;
}

Don’t laugh yet, this is exactly how most people solve this first, before they learn all the quirks of multithreading. Wait, everybody heard about these nasty race conditions? Let’s throw some locking into the mix:

this.counter = arrayStrings.Length;
this.counterLock = new object();

foreach (string someString in arrayStrings) {
    ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}

while (this.counter > 0) Thread.Sleep(100);

...proceed with the rest of your logic...

public void DoSomething(object data)
{
    ... do some processing...

    lock(this.counterLock){
        this.counter--;
    }
}

Can we get any better than this? Yes, by using the Interlocked:

this.counter = arrayStrings.Length;
this.counterLock = new object();

foreach (string someString in arrayStrings) {
    ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}

while (this.counter > 0) Thread.Sleep(100);

...proceed with the rest of your logic...

public void DoSomething(object data)
{
    ... do some processing...

    Interlocked.Decrement(ref this.counter);
}

This looks… almost professional ;) (You see, you can’t be almost professional, just like your girlfriend can’t be almost pregnant.) This Thread.Sleep looks ugly to me. Chances are, the main thread will sleep for as much as 100ms after all worker threads are done. Let’s see if we can add some ManualResetEvent love:

List<ManualResetEvent> statusFlags = new List<ManualResetEvent>();

foreach (string someString in arrayStrings) {
    var flag = new ManualResetEvent(false);
    statusFlags.Add(flag);
    ThreadPool.QueueUserWorkItem(this.DoSomething, new AWrapperClass(someString, flag));
}

WaitHandle.WaitAll(statusFlags.ToArray());

...proceed with the rest of your logic...

public void DoSomething(object data)
{
    ... do some processing...
    (data as AWrapperClass).flag.Set();
}

Hurray! This is perfect! Except that it got quite complicated with all this techy stuff and the AWrapperClass. You can feel proud showing off your …ummm… masterhood at the code reviews. Now, if you want to become a real master, you got to start thinking like one: how can I make this code trivial? Let’s see what .NET 4.0 holds for us. Let’s start with CountdownEvent, as it is the closest thing there’s to our initial counter idea:

this.flagFinish = new CountdownEvent(arrayStrings.Length); <=== INITIALIZING COUNT

foreach (string someString in arrayStrings) {
    ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}

flagFinish.Wait();

...proceed with the rest of your logic...

void DoSomething(string someString){
    ... do some processing...

    flagFinish.Decrement(); <== WORKER THREADS SIGNAL WHEN DONE
}

Nice and easy! Much cleaner than anything before! The only thing that worries me still is how the waiting logic is still separate from the queuing logic. Luckily for us there’s a new concept in .NET 4.0, called Task, that provides facilities for queuing as well as waiting, in the single package:

var tasks = new List<Task>();

foreach(var someString in arrayStrings)
{
    tasks.Add(Task.Factory.StartNew(() => DoSomething(someString));
}

Task.WaitAll(tasks.ToArray());

Finally, there’s no need to show the DoSomething method. There’s nothing in there except business logic and das is good! Even more concise variation of same pattern would be to use Parallel.Invoke:

var actions = new List<Action>();

foreach(var someString in arrayStrings)
{
    actions.Add(() => DoSomething(someString));
}

Parallel.Invoke(actions.ToArray());

Wait a second… There’s a little problem with this variant… When we used Tasks we started each task inside the foreach loop. Here, we only add them to the list and start them after the loop. Depending on your problem this may be better or worse. Particularly if preparing the task takes a long time and you want to make sure the previous tasks utilize that time to accomplish some of their processing. But don’t get stuck on these two. Believe it or not, there’s an even better option. Enter our final contestant, Parallel.ForEach:

Parallel.ForEach<string>(arrayStrings, someString =>
{
    DoSomething(someString);
});

This is as good as it can get in C#, at least in its 4th incarnation. No control logic, no techy stuff, no nonsense to be proud off at code reviews. Just pure expressive power!

And so it is!


Jul 09 2009

Why NHibernate updates DB on commit of read-only transaction

Category: Uncategorizedzvolkov @ 10:06 am

Always be careful with NULLable fields whenever you deal with NHibernate. If your field is NULLable in DB, make sure corresponding .NET class uses Nullable type too. Otherwise, all kinds of weird things will happen. The symptom is usually will be that NHibernate will try to update the record in DB, even though you have not changed any fields since you read the entity from the database.

The following sequence explains why this happens:

  1. NHibernate retrieves raw entity’s data from DB using ADO.NET
  2. NHibernate constructs the entity and sets its properties
  3. If DB field contained NULL the property will be set to the defaul value for its type:
    • properties of reference types will be set to null
    • properties of integer and floating point types will be set to 0
    • properties of boolean type will be set to false
    • properties of DateTime type will be set to DateTime.MinValue
    • etc.
  4. Now, when transaction is committed, NHibernate compares the value of the property to the original field value it read form DB, and since the field contained NULL but the property contains a non-null value, NHibernate considers the property dirty, and forces an update of the enity.

Not only this hurts performance (you get extra round-trip to DB and extra update every time you retrieve the entity) but it also may cause hard to troubleshoot errors with DateTime columns. Indeed, when DateTime property is initialized to its default value it’s set to 1/1/0001. When this value is saved to DB, ADO.NET’s SqlClient can’t convert it to a valid SqlDateTime value since the smallest possible SqlDateTime is 1/1/1753!!! The exception it throws looks like this:

NHibernate.Event.Default.AbstractFlushingEventListener - Could not synchronize database state with session
NHibernate.HibernateException: An exception occurred when executing batch queries ---> System.Data.SqlTypes.SqlTypeException: SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM.
at System.Data.SqlTypes.SqlDateTime.FromTimeSpan(TimeSpan value)
   at System.Data.SqlTypes.SqlDateTime.FromDateTime(DateTime value)
   at System.Data.SqlClient.MetaType.FromDateTime(DateTime dateTime, Byte cb)
   at System.Data.SqlClient.TdsParser.WriteValue(Object value, MetaType type, Byte scale, Int32 actualLength, Int32 encodingByteSize, Int32 offset, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.TdsExecuteRPC(_SqlRPC[] rpcArray, Int32 timeout, Boolean inSchema, SqlNotificationRequest notificationRequest, TdsParserStateObject stateObj, Boolean isCommandProc)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at System.Data.SqlClient.SqlCommandSet.ExecuteNonQuery()
   at NHibernate.AdoNet.SqlClientSqlCommandSet.ExecuteNonQuery()
   --- End of inner exception stack trace ---
   at NHibernate.AdoNet.SqlClientSqlCommandSet.ExecuteNonQuery()
   at NHibernate.AdoNet.SqlClientBatchingBatcher.DoExecuteBatch(IDbCommand ps)
   at NHibernate.AdoNet.AbstractBatcher.ExecuteBatch()
   at NHibernate.Engine.ActionQueue.ExecuteActions(IList list)
   at NHibernate.Engine.ActionQueue.ExecuteActions()
   at NHibernate.Event.Default.AbstractFlushingEventListener.PerformExecutions(IEventSource session)
   at NHibernate.Event.Default.DefaultFlushEventListener.OnFlush(FlushEvent event)
   at NHibernate.Impl.SessionImpl.Flush()
   at NHibernate.Transaction.AdoTransaction.Commit()

The easiest fix is to make the class property use Nullable<T> type, in this case “DateTime?”. Alternatively, you could implement a custom type mapper by implementing IUserType with its Equals method properly comparing DbNull.Value with whatever default value of your value type. In our case Equals would need to return true when comparing 1/1/0001 with DbNull.Value. Implementing a full-functional IUserType is not really that hard but it does require knowledge of NHibernate trivia so prepare to do some substantial googling if you choose to go that way.

Hope this helps somebody!