Oct 28 2009

NHibernate parameter sizes controversy: history of the issue

Category: Uncategorizedzvolkov @ 8:32 am

This post is my digest of facts and opinions on the infamous NHibernate parameter sizes controversy.
Disclaimer: Everything that follows is MY INTERPRETATION of the story, based on MY UNDERSTANDING of the statements and replics made by all the participants.

The issue was first reported in a blog post by Grant Fritchey on 04/29/2008. He discovered that the way NHibernate (then v1.2) generates some of its parametrized queries (specifically for inserts of entities) causes a proliferation of execution plans in MS SQL Server’s (then v2005) execution plan cache. This happens because NHibernate uses actual length of parameter’s value as the size of the variable in the generated SQL query.

The issue has resurfaced on 03/03/2009 in this thread on Italian section of NH User Group by Claudio Maccari (the conversation is in Italian so here’s a link to google-translated version). In this post Claudio further reports that it’s not just entity operations, but regular queries (HQL and ICriteria) that also suffer from the same problem. [Since entity inserts/upates/selects always go by PKs which rarely use variable-length types the impact of the issue would be low if it did not apply to queries -- zvolkov 10/28]. Fabio first thinks its a Fluent NH problem, then he theoretizes it could be a mapping problem (due to length=”X” missing). Claudio provides numerous examples of incorrect behavior (even with length=”X” present) and proposes setting “prepare_sql” to true as a possible solution. Claudio shows how this setting forces NH to use nvarchar(4000) for string parameter sizes. He also shows how using an overload of SetParameter (e.g. TypeFactory.GetStringType(50) or TypeFactory.GetAnsiStringType(50)) one can override both type and size. Fabio seems surprised and frustrated at the apparent difficulties NH presents for dealing with parameters.

On 03/10 Claudio publishes a post on his blog, in which he explains the issue and the solutions he found so far.

On the same day, 03/10, Daniel Auger, having found the original post by Grant Fritchey, starts another thread on NH Users Group. Fabio seems frustrated with apparent misunderstanding of the nature of the issue, points out this is a MS SQL Server specific issue and implies that the fix should be done in MS SQL Server itself. He also repeats Claudio’s idea to use prepare_sql=true as a way to deal with the issue.

On 3/13, the question is raised at the very beginning of Herding Code podcast, episode #38 with Ayende et al. Despite lacking complete info, Ayende lightheartedly dismisses the issue as not existent or(!) perhaps being easy to fix.

On 03/15, in a comment to Claudio Maccari’s blog post, Fabio again claims the problem to be a MS SQL Server issue. Claudio argues that the issue should be fixed in NHibernate by making MsSql2005Dialect “use Length attribute by-default”.

On 03/16, by Fabio’s request Claudio opens a JIRA ticket (NH-1707) to default prepare_sql to true for MS SQL Server. During subsequent several months, the change is implemented, reverted, and reimplemented again, as NH team discovers and fixes related issues (NH-1713, NH-1710, NH-1718). At one point Fabio even reverts the ticket, concerned that defaulting prepare_sql to true may require user code changes, to call a different overload of SetParameter. The final version of NH 2.1, has the change INCLUDED EXCLUDED (~ zvolkov 11/2).

On 04/01, Carsten Hess replies to Daniel Auger’s NH user group thread, and points out the difference between parametrized (sp_executesql) and prepared (sp_prepexec) queries. He points out that prepared queries (used when prepare_sql is set to true) are only reused for the same connection which will cause proliferation of query plans as generated by multiple connections. (This statement later proves not accurate, see below — zvolkov 11/2) Based on this argumentation he suggests to not set prepare_sq to true and instead solve the problem in SqlClientDriver.cs by making GenerateCommand always call SetParameterSizes. However, he argues, even that fix would be suboptimal as the generated parameters will be of type nvarchar(4000) unless the undocumented type=”String(100)” syntax is used in the mappings (this turns out to be incorrect as well ~zvolkov 11/2)

On 05/02, Ayende posts on the issue, claiming that prepare_sql=true is the right solution to the problem. On 05/04 Carsten Hess comments on the post, repeating his line of argumentation, but Ayende disables the comments w/o replying.

The issue resurfaces again on 10/26/2009, in a blog post by Naz of objectreference.net. Besides restating the problem, the post clearly illustrates how this specific issue (whether it’s a real problem or not) hurts NH adoption.

On same day of 10/26, Daniel Auger opens a new thread on NH Users Group, requesting a clarification on the status of the issue. At first, Fabio can’t seem to remember the exact issue and even suggests it was fixed in NH 2.1.1, but finally he repeats his claim that this is a MS SQL Server issue. Other users agree with the assesment but repeat the NH adoption argument. Fabio provides detailed instructions for subclassing SqlClientDriver and injecting it through connection.driver_class setting. Commenting on the possibility of making this change a permanent part of NH trunk, Fabio insists that the condition of calling SetParameterSizes only when prepare_sql=true must be there for a reason and expresses concern that such change may have unintended consequences in some scenarios [i.e. for data types other than strings and RDBMSes other than MS SQL Server -- zvolkov 10/28]. He also replies to the adoption argument by saying that every technology has a limited lifecycle and so does NH. At the end, Fabio seems to be open to the idea of making the SetParameterSizes change, provided the normal procedure is followed (JIRA ticket complete with explanation of the issue and failing tests).

Note: this part of the post has been updated since it was first written. Click here to see the original ending.

On 10/30, Carsten Hess came back with a statement that turned the story on its head. According to his findings, execution plans generated by sp_prepexec ARE actually global and not per connection (unlike he said before), so having prepare_sql=true does not cause query cache pollution, at least not in NH 2.1 (apparently NH 2.0 may still have some issues since some parametrization defects were not fixed until 2.1). Among other things this means we no longer have to override GenerateCommand in SqlClientDriver to always call SetParameterSizes, as that’s what NH 2.1 does by default, with prepare_sql set to true (this is no longer accurate, see below ~zvolkov 11/2). This also means Fabio and Ayende were right in their downplaying of the issue, even though I still think they did a bad job communicating their position to the public (something I can’t help but attribute to cultural differences).

After my first correction of this post, on 11/2, Naz replies to the NH thread. He clarifies that prepare_sql is in fact NOT defaulted to true in NH 2.1, and so the users will still have to make this configuration change by hand.

The same day, Trent Niemeyer in a comment to my second blog post corrects another Carsten Hess’s statement and says that (at least in v2.1) NHibernate does NOT ignore length attribute in the mappings and that undocumented type=”String(100)” syntax is no longer the only way to effectively specify the length of query parameters.

I just verified both news and found them to be correct (and their original counterparts, correspondingly, wrong). As V.I. Lenin once said: “Trust, but verify!”

To summarize, all we need to do is:

  1. set prepare_sql to true
  2. (optional) add type=”AnsiString” and length=”LENGTH” to mappings of string properties to make NH use varchar(LENGTH) instead of nvarchar(4000) for most basic queries
  3. Keep an eye on more complex queries and be ready to use the overload of SetParameter that takes TypeFactory.GetAnsiStringType(LENGTH)

The reason prepare_sql is an opt-in feature is because it may change behavior of existing applications. Still, it is highly recommended that users of MS SQL Server set it to true, to avoid the query cache pollution. The alternative solution — the SqlDriverOverride hack — can be employed to keep the NH-generated SQL queries visible in SQL Profiler every time they are executed, and not once per connection, which is what happens when prepare_sql is set to true:

  1. leave prepare_sql at its default value of false
  2. Create you own class inheriting from SqlClientDriver and override GenerateCommand to always call SetParameterSizes:
    using System.Data;
    using NHibernate.Driver;
    using NHibernate.SqlCommand;
    using NHibernate.SqlTypes;
    
    namespace XXX.YYY.ZZZ
    {
        public class CustomSqlClientDriver : SqlClientDriver
        {
            public override IDbCommand GenerateCommand(CommandType type, SqlString sqlString, SqlType[] parameterTypes)
            {
                var command = base.GenerateCommand(type, sqlString, parameterTypes);
                SetParameterSizes(command.Parameters, parameterTypes);
                return command;
            }
        }
    }
  3. set connection.driver_class to assembly-qualified name of your custom driver implementation
  4. (optional) add type=”AnsiString” and length=”LENGTH” to mappings of string properties to make NH use varchar(LENGTH) instead of nvarchar(4000) for most basic queries
  5. Keep an eye on more complex queries and be ready to use the overload of SetParameter that takes TypeFactory.GetAnsiStringType(LENGTH)

To conclude the post, here are the lessons I learned from this story:

  • If you think you’ve found NH 2.1 does something stupid that will kill your production server you’re most likely wrong. NHibernate 2.1 is a mature enterprise-ready ORM with most major wrinkles already ironed out.
  • If your customer community thinks your product has a bug and you know it does not, they will do their best to fix it anyway, unless you take time to explain there is no bug.
  • In serious organizations the DBAs are a big power, better have them as your friends.
  • Blogging about programming controversies is not the best way to drive traffic to your website. Breaking news work better and are much easier to make.
  • Never take anything on faith w/o verifying the facts for yourself (added 11/2)

 


Oct 22 2009

Returning IEnumerable from “using” statement

Category: Uncategorizedzvolkov @ 1:59 pm

I like LINQ very much, both the query comprehension and the extension methods. Most of my set-based logic looks much better in LINQ. There is however one scenario where blindly replacing foreach with LINQ is very dangerous. I’m talking about scenario when you need to return an IEnumerable from inside the “using” statement.

Consider having following (fake) code:

public IEnumerable<Order> Lookup(IEnumerable<int> orderIDs)
{
    using (session = SessionFactory.OpenSession())
    {
        foreach (var orderID in orderIDs)
            yield return session.Get<Order>(orderID )
    }
}

Your first reaction may be to convert it to something like this:

public IEnumerable<Order> Lookup(IEnumerable<int> orderIDs)
{
    using (session = SessionFactory.OpenSession())
    {
        return orderIDs.Select(orderID => session.Get<Order>(orderID));
        //OR
        return from orderID in orderIDs select session.Get<Order>(orderID);
    }
}

Notice the problem? Yes, in the second case (with either query comprehension or extension method) the session will be Disposed immediately, long before the output of the function is enumerated!


Oct 16 2009

Memcached for NHibernate configuration attributes

Category: Uncategorizedzvolkov @ 2:01 pm

In my original article on using Memcached w/ NHibernate I mentioned a few configuration attributes to set in your session-factory properties. In this post I want to provide a definitive list of the settings, based on the actual source code of NHibernate.Caches.MemCache.dll version 2.1.0.4000.

  • cache.region_prefix or regionPrefix (string): when NH’s Memcache Provider is asked to store an item, it builds the key to pass to the Memcached client DLL. This key is built as [regionPrefix][regionName]@[key]. The regionPrefix is what specified by this parameter, while regionName is built by the NHibernate core library as [cache.region_prefix].[cacheRegionName] where cache.region_prefix is another session-factory level setting. As you can see, either of regionPrefix or cache.region_prefix can be used to achieve the same effect. Since cache.region_prefix is a generic setting and regionPrefix is a provider-specific, I recommend using cache.region_prefix instead. Default is “” (empty string).
  • cache.default_expiration or expiration (Int32): since NH Contrib 2.1 cache.default_expiration is the new setting name that should be used instead of expiration to specify number of seconds after which the cache item must be invalidated. Default value is 300 seconds. The old name is still supported for backward compatibility.
  • compression_enabled (Bool): enables automatic compression of large values (longer than 30720 bytes, not currently configurable from NHibernate) using GZip. The purpose of the compression is to reduce network bandwith so compression/decompression is performed on the client side. If your cache resides on the same server as the application, it may be a good idea to turn this off. Default is true.

Following descriptions came from this post. I added the default values by looking at the Memcached.ClientLibrary.dll with Reflector.

  • failover (bool): If this flag is set to true and a socket fails to connect,  the pool will attempt to return a socket from another server if one exists.  If set to false, then getting a socket will return null if it fails to connect to the requested server. Default is true.
  • initial_connections (Int32): the initial number of connections per server setting in the available pool. Default is 3.
  • maintenance_sleep (Int64): the sleep time(in milliseconds) between runs of the pool maintenance thread. If set to 0, then the maintenance thread will not be started. Default is 5000.
  • max_busy (Int64): the maximum busy time(in milliseconds) for threads in the busy pool. Default is 300000.
  • max_connections (Int32): the maximum number of spare connections allowed in the available pool. Default is 10.
  • max_idle (Int64): the maximum idle time for threads in the available pool. Default is 180000.
  • min_connections (Int32): the minimum number of spare connections to maintain in the available pool. Default is 3.
  • nagle (bool): the Nagle algorithm flag for the pool. If false, will turn off Nagle’s algorithm on all sockets created. Default is true.
  • socket_timeout (Int32): the socket timeout(in milliseconds) for reads. Default is 10000.
  • socket_connect_timeout (Int32): the socket timeout(in milliseconds) for connects. Default is 50.


Oct 14 2009

Speed-up your Visual Studio with RAM drive

Category: Uncategorizedzvolkov @ 2:56 pm

Today I was researching SSD options trying to find my perfect SSD when I stumbled upon this little post by Jeffrey Palermo. Basically, he tried several SSDs and came to conclusion that the incremental performance improvements given by SSDs do not justify their high cost. Instead he suggests… using a RAM drive! Now, for those of us who were computer users back in 5″ diskette times, this may sound like a blast from the past but it actually makes a lot of sense.

The idea is to use RAM drive to store… the source code! And not just source code but entire solution workset, including the obj/Debug/Release folders and any third-party DLLs you reference. This may not sound like much but think what it does to your Visual Studio experience: not only all source files are always in RAM, immediately available to Visual Studio, C# compiler, and Resharper, but the results of the compilation are also stored in RAM which does miracles to multi-project solutions!

Now, you may worry about safety: what happens to my precious sources if machine needs to be restarted or the power goes down? Think about it: all source files are already stored in Source Control anyway. If you check-in moderately often (which is always a good idea from the perspective of Continuous Integration) you cannot loose more than a few hours of work. Besides, the tool Jeffrey recommends, freeware DataRam RAMDisk, can save the RAM drive to hardrive both on shutdown and at configured interval (I have mine set at 15 minutes) which makes the whole setup pretty damn safe.

The best news are saved for those who have integration tests hitting a local SQL Server database (as opposed to semi-compatible surrogates like SQLite which I had lots of problems with): according to my tests you can speed-up your integration tests 2-6 times! Indeed, in my case I saw a 6 times improvement, from 1+ hour down to 10 minutes! Although my setup is kind of stupid (I have entire database detached, copied over, and reattached for every single integration-test) but even your normal tests will breathe easier, believe me.

Now, if you happen to have a bunch of projects all over your hard drive, plus a few DB data folders, or even one huge source code folder that just won’t fit in RAM, you may scratch your head, not willing to reorganize your entire workspace just to try out the new fad. My answer to that: use junction points. Indeed with juncpoints you can create a bunch of folders on your RAM drive and map each of them to a separate folder on your primary drive. This way you can only switch the most active of your projects to the RAM drive while keeping others on HDD.

Enough said, go and check it out. And never complain your builds are slow!

  1. Download and install DataRam RAMDisk
  2. Create a 2Gb RAM drive and leave it Unformatted. Watch out for memory consumption on your machine. On my 8Gb RAM machine I found 4Gb RAM drive putting too much stress on available memory, to the point where OS would swap a lot.
  3. Configure it to be saved on shutdown, loaded on startup, and saved on interval (say, every 15 minutes). Ideally you want the .img file to be stored on another physical hard drive, separate from the one your OS runs on. This way when RAM disk is saved (and it takes good 5 minutes to save 4GB!!!) it won’t slow the entire system down.
  4. Start the RAM disk. This will fail the first time as the .img file won’t yet exist but it will create it when you save the disk.
  5. Go to the Disk Manager, initialize the new “disk” and format it as NTFS
  6. Mount the disk as R:. Do not enable compression as that will kill the performance.
  7. Stop SQL Server, exit Visual Studio and quit any other programs that may access the folders you want to move to RAM drive
  8. Copy your IO-heavy folders to the RAM disk as required; I suggest mirroring the same structure on R: that you have on your C: drive. This way you always know which RAM folder corresponds to which physical drive folder.
  9. If you had custom permissions set on the original folders, they won’t get copied. You will have to set them by hand
  10. Rename the original folders to something like .bak — you can delete them later
  11. Create juncpoints to map original folder names to their new locations on R: drive. Do not use symlinks (the new feature of Vista intended to replace juncpoints) since the are in fact LESS transparent for Visual Studio. With symlinks, if you put your DLLs on the R: drive and add reference to your Visual Studio project, sometimes it may use the absolute R:\ path instead of the relative path causing build issues on the remote build server. With juncpoints, Visual Studio will happily use relative path.
  12. Start SQL Server, launch Visual Studio and enjoy!


Oct 07 2009

Vibrant Ink colors for BeyondCompare

Category: Uncategorizedzvolkov @ 10:18 am

As I said on my tools list page, my favorite diff tool is BeyondCompare. Since most of my tools are configured to use the dark Vibrant Ink color scheme (see my Notepad++ and Visual Studio screenshots) I figured I want BeyondCompare to look the same. After 5 minutes of googling and not finding anything I created my own settings. You can download them here:

Vibrant Ink colors for BeyondCompare



Oct 05 2009

NHibernate 2.1 tries to connect to DB on SessionFactory creation

Category: Uncategorizedzvolkov @ 11:17 am

After upgrade to NHibernate 2.1, some of my integration tests started failing. I traced the error down to NHibernate SessionFactory creation. These were the tests that created SessionFactory (with invalid connection string) but never ran any queries. I figured it was the new version of NH connecting to the database on its own! Here’s why:

It turns out NH 2.1 connects to DB to get list of reserved keywords. This happens in BuildSessionFactory method and is ON by default. The reason it gets the keywords is to automatically quote one if you happen to use it as a table or column name.

This behavior can be disabled by setting NHibernate configuration property hbm2ddl.keywords to the value of none.