Andy Hunt's Agile Carolinas Talk

§ April 28, 2009 14:11 by beefarino |

I just got back from hearing Andy Hunt discuss his new book, Pragmatic Thinking and Learning.  I hadn't heard him speak before, and I have to admit I was not sure what to expect.

On the one hand, I hold many of this books in very high regard.  In particular, the following books were career-altering reads for me:

And his keystone role in the Agile movement has earned my utmost respect.  However, when I saw the title of this latest book I was a bit worried.  I have a master's degree in cognitive psychology, and I know a lot about learning, memory, and perception.  Most learning books outside of the field are crap, so honestly my first instinct was that this was going to be a cash-grab treatise of self-help porch psychology fuzzy-feel-goods for software developers.

After listening to Andy's presentation, I am happy to say my instincts were way off the mark.

First of all, the book (and Andy's presentation) remains true to the pragmatic banner.  His recommendations are both practical and effective.  For example, a recurring theme during this talk was to write things down.  Carry around a small notebook and jot down every idea that pops into your head.  Maintain a personal wiki.  Mindmap as you read or think.  Solidify your thoughts into something concrete.  The point is not to be able to refer back to them later, but instead to force your brain to keep working, keep producing, keep processing.  To summarize one of his points, that's the first step to an accomplishment.

Second, a lot of what he was saying is supported by academic research.  Granted, Andy takes some license with his metaphors but his points hold water.   E.g., what Andy refers to as the "memory bus" being shared between "dual cores" of the brain is probably more of an attention effect; however, the behavioral effect cannot be denied - the serial and holistic parts of your mind compete when trying to solve a problem.

Based on the presentation tonight, I wouldn't recommend the book to my former psychology colleges - it's too macro to be useful to them.  However, for my fellow geeks, this is actually a useful introduction to becoming a more effective learner.

It was a great talk, a fun evening, and I plan to pick up the book next time I'm at the bookstore.  Oh, and I had a short chat with Andy just before the talk, and I have to say how awesome it is to meet someone you hold as a guru and detect no ego.  Awesome.



Log4Net Recommended Practices pt 2: Isolating Bugs

§ April 21, 2009 04:56 by beefarino |

I realize it's been a while since I've written about logging, but my experiences this morning compelled me to share.  The muse for this post was a very simple bug located in a rat's nest of complexity.

The code in question drives the I/O layer in a device that is filled with proprietary hardware.   There are about three dozen channels of internal communication when this thing is running, which doesn't include any of its networking.  So lots of asynchronous I/O and threading.  All supporting an awesome graphics and physical interface layer that is equally complex on its own.

Except that today, it was a paperweight.  In fact, you couldn't even use the operating system effectively.  Eventually task manager would pop up and show that something in the application was suffocating the CPU.  I was not looking forward to diagnosing this problem given the level of complexity involved.  I expected to spend the morning on it, but in all it took about 10 minutes to find and fix the problem.

I had worked with the application to know what pattern of messages to expect in the log files, but when I opened them I found some of the messages to be missing.  Specifically, the messages pertaining to just one of those three dozen communication channels I mentioned earlier.  After some initial testing, I had isolated the problem to a case where that channel was misconfigured.  I re-ran the application and grabbed the fresh logs.

Since I mirror my class structure in logger objects, isolating the relevant log entries was easy using logparser.  The list was surprisingly short, ending with this message:

initiating PSU polling thread 

I quickly found this message in my code:

void PollForReports()
{
    Log.Info( "initiating PSU polling thread" );
    while( 1 == Interlocked.Read( ref suspendInterlock ) )
    {
        continue;
    }
 
    RequestSerialNumber();
    Log.Debug( "requested PSU serial number ..." );
        
    // ...

As you can see, a "requested PSU serial number..." log message should have been written almost immediately after the last message I found in the log.  Something in the code after the "initiating PSU polling thread" message is written and before the "requested PSU serial number ..." message is locking the CPU. 

Do you see it - the improperly implemented spin lock?  It's pretty obvious when you know where to look, and logging can make it a lot easier to know where to look.



Delegates and Native API Callbacks - Answer

§ April 16, 2009 16:35 by beefarino |

A while back I posted a little puzzle about an exception I was hitting after passing a delegate to a native API call.  In a nutshell, my application was passing an anonymous delegate to an unmanaged library call:

Result Code result = ExternalLibraryAPI.Open(
    deviceHande,
    delegate( IntPtr handle, int deviceId )
    {
        // ...
    }
);
VerifyResult( result ); 

The call returns immediately, and the unmanaged library would eventually invoke the callback in response to a user pressing a button on a device.  After a semi-random period, pressing the button would yield an exception in my application.  The questions I asked were:

  1. What's the exception?
  2. How do you avoid it?

I've waited a bit to post the answers to see if anyone besides Zach would chime in.  Zach correctly identified the nut of the problem - the delegate is being garbage collected because there is no outstanding reference on the anonymous delegate once the unmanaged call returns.  With no one referencing the delegate object, the garbage collector is free to reclaim it.  When the device button is pushed, the native library invokes the callback, which no longer exists in memory.  So, to answer the first question, the specific exception that is raised is a CallbackOnCollectedDelegate:

CallbackOnCollectedDelegate was detected.
Message: A callback was made on a garbage collected delegate of type 'Device.Interop!Device.Interop.ButtonPressCallback::Invoke'. This may cause application crashes, corruption and data loss. When passing delegates to unmanaged code, they must be kept alive by the managed application until it is guaranteed that they will never be called.

The verbage in this exception message answers my second question.  To avoid the exception, you need to hold a reference to any delegate you pass to unmanaged code for as long as you expect the delegate to be invoked.  You need to use an intermediary reference on the delegate to maintain it's life.

Based on this, any of these examples are doomed to fail eventually, because none of them maintain a reference on the delegate object being passed to the unmanaged library:

ExternalLibraryAPI.Open(
    deviceHande,
    delegate( IntPtr handle, int deviceId )
    {
        // ...
    }
);
 
ExternalLibraryAPI.Open(
    deviceHande,
    new ButtonPressCallback( this.OnButtonPress )
);
 
ExternalLibraryAPI.Open(
    deviceHande,
    this.OnButtonPress
);

The correct way to avoid the problem is to hold an explicit reference to the specific delegate instance being passed to unmanaged code:

ButtonPressCallback buttonPressCallback = this.OnButtonPress;
ExternalLibraryAPI.Open(
    deviceHande,
    buttonPressCallback
);
// hold the reference until we're sure no
// further callbacks will be made on the
// delegate, then we can release the
// reference and allow it to be GC'ed
buttonPressCallback = null;

At first I thought Zach's pinning solution was correct; however, you can only pin blittable types, of which delegates are not, so "pinning a delegate" isn't even possible or necessary.  If you're interested, the details of how delegates are marshalled across the  managed/unmanaged boundary are quite interesting, as I found out from Chris Brumme's blog:

Along the same lines, managed Delegates can be marshaled to unmanaged code, where they are exposed as unmanaged function pointers.  Calls on those pointers will perform an unmanaged to managed transition; a change in calling convention; entry into the correct AppDomain; and any necessary argument marshaling.  Clearly the unmanaged function pointer must refer to a fixed address.  It would be a disaster if the GC were relocating that!  This leads many applications to create a pinning handle for the delegate.  This is completely unnecessary.  The unmanaged function pointer actually refers to a native code stub that we dynamically generate to perform the transition & marshaling.  This stub exists in fixed memory outside of the GC heap.

However, the application is responsible for somehow extending the lifetime of the delegate until no more calls will occur from unmanaged code.  The lifetime of the native code stub is directly related to the lifetime of the delegate.  Once the delegate is collected, subsequent calls via the unmanaged function pointer will crash or otherwise corrupt the process. 

Thanks again Zach, and to everyone who reads my blog!



Load-Balancing the Build Farm with CruiseControl.NET

§ April 6, 2009 02:25 by beefarino |

Our CI system isn't terribly complicated, thankfully.  It evolved from a batch file on a single machine to a farm of VMs running CruiseControl.NET and a single Subversion repository.  The triggering mechanism hasn't changed during this transition though: it's a custom beast, consisting of a post-commit hook on our repository logs revisions into a database, and a custom sourcecontrol task is used to poll the database for revisions that haven't been built yet.  It works fine and naturally creates a balanced build farm: several VMs can be configured to build the same project, and the database sourcecontrol task prevents more than one VM from building the same revision.

As well as it works, it has some problems.  First, the post-commit hook relies heavily on SVN properties, which are a pain to maintain and make it impossible to simply "add a project to the build" without going through a lot of unnecessary configuration.  Moreover, the hook is starting to seriously hold up commits, sometimes as long as 20 seconds.  

Second, and more irritating, the system builds every individual commit.  By that, I mean it builds each and every revision committed to the repository - which may not sound like a bad thing, except that it includes every revision, even those that occurred before the project was added to the CI farm - all the way back to the revision that created the project or branch.  I have to manually fudge the database, adding fake build results to prevent the builds from occurring.  It's not hard, but it is a pain in the ass.  And with the team's overuse of branching I'm finding myself having to fudge more and more. 

I'm really trying to move the system towards what "everyone else does," by which I mean trigger builds by polling source control for changes.  No more database, no more post-commit hook, no more SVN property configuration, just Subversion and CruiseControl.NET.  It would be easy enough to do - simply change our CC.NET project configurations to use the standard SVN source control task.  The problem is that without the database, the farm is no longer automagically load-balancing - every VM in the farm would end up building the same source, which defeats the purpose of the farm.

I figured that I could recoup the load-balancing if I had an "edge" server between Subversion and the build farm.  This server could monitor source control and when necessary trigger a build on one of the VMs in the build farm.  So instead of each farm server determining when a build should occur, there is a single edge server making that decision.

CC.NET ships with the ability to split the build across machines - that is, for a build on one machine (like the edge server) to trigger a build on another machine (like a farm server); however, there is no load-balancing logic available.  So I made some of my own...

Edge Server CC.NET Plugin

The edge server plugin operates on a very simple algorithm:

  1. Get a list of farm servers from configuration;
  2. Determine which of farm servers are not currently building the project;
  3. Fail the build if no farm server is available for the project;
  4. Force a build of the project on the first available farm server.

If all you want is the project source code, here it is: ccnet.edgeserver.zip (607.87 kb)

Take a look at the configuration of the plugin; I think it will make the code easier to digest.

Edge Server Configuration

The edge server consists of little more than a source control trigger and a list of farm servers:

<cruisecontrol>
  <project name="MyProject">
    <triggers>
      <intervalTrigger seconds="10" />
    </triggers>
    <sourcecontrol type="svn">
      <trunkUrl>svn://sourcecontrol/Trunk</trunkUrl>
      <autoGetSource>false</autoGetSource>
    </sourcecontrol>
    <labeller type="lastChangeLabeller" prefix="MyProject_"/>
    <tasks>
      <farmBuild>
        <farmServers>
          <farmServer priority="1" uri="tcp://build-vm-1:21234/CruiseManager.rem" />
          <farmServer priority="2" uri="tcp://build-vm-2:21234/CruiseManager.rem" />
          <farmServer priority="3" uri="tcp://build-vm-3:21234/CruiseManager.rem" />
        </farmServers>
      </farmBuild>
    </tasks>
    <publishers>
      <nullTask />
    </publishers>
  </project>
</cruisecontrol>

When CC.NET is run with this configuration, it will monitor the subversion repository for changes to the "MyProject" trunk (lines 6-9); note that since autoGetSource is false, no checkout will occur.  The edge server will never have a working copy of the source.

The load-balancing is configured in lines 12-18; in this example, three farm servers are configured in the farm for "MyProject", with build-vm-1 having the highest priority for the build (meaning it will be used first when all three servers are available).  When a change is committed to the repository, the edge server will choose one of these servers based on its availability and priority, and then force it to build the project.

Farm Server Configuration

The farm server is configured just as a normal CC.NET build, except for two key differences: first, it is configured with no trigger; second, a remoteProjectLabeller is used to label the build.  Here's a sample configuration, with mundane build tasks omitted for brevity:

<cruisecontrol>
  <project name="MyProject">
    
    <triggers/>
    <sourcecontrol type="svn">
      <trunkUrl>svn://sourcecontrol/MyProject/Trunk</trunkUrl>
      <autoGetSource>true</autoGetSource>
    </sourcecontrol>
    <labeller type="remoteProjectLabeller">
      <project>MyProject</project>
      <serverUri>tcp://edgeServer:21234/CruiseManager.rem</serverUri>
    </labeller>
    <tasks>
      <!--
              ... 
      -->
    </tasks>
    <publishers>
      <!--
              ... 
      -->
    </publishers>
  </project>
</cruisecontrol> 

Details to note here are:

  • the labeller points to the edge server to obtain the build label; this is necessary because labels are generated during the build trigger, which on the farm server is always forced and won't include any source revision information;
  • the project name on the farm server matches exactly the project name on the edge server; this is a convention assumed by the plugin.

Source Code Overview

I need a FarmServer type to support the CC.NET configuration layer:

using System;
using System.Collections.Generic;
using System.Text;
using ThoughtWorks.CruiseControl.Core;
using ThoughtWorks.CruiseControl.Remote;
using Exortech.NetReflector;
using ThoughtWorks.CruiseControl.Core.Publishers;
using System.Collections;
using ThoughtWorks.CruiseControl.Core.Util;
namespace CCNET.EdgeServer
{
    [ReflectorType( "farmServer" )]
    public class FarmServer
    {
        [ReflectorProperty( "uri" )]
        public string Uri;
 
        [ReflectorProperty( "priority" )]
        public int Priority;
    }
}

No real surprises here.  Each FarmServer instance holds a URI to a CC.NET farm server and it's priority in the balance algorithm. 

The real meat is in the FarmPublisher class:

using System;
using System.Collections.Generic;
using System.Text;
using ThoughtWorks.CruiseControl.Core;
using ThoughtWorks.CruiseControl.Remote;
using Exortech.NetReflector;
using ThoughtWorks.CruiseControl.Core.Publishers;
using System.Collections;
using ThoughtWorks.CruiseControl.Core.Util;
namespace CCNET.EdgeServer
{
    [ReflectorType( "farmBuild" )]
    public class FarmPublisher : ITask
    {
        ICruiseManagerFactory factory;
 
        [ReflectorProperty( "Name", Required = false )]
        public string EnforcerName;
 
        [ReflectorHash( "farmServers", "uri", Required = true )]
        public Hashtable FarmServers;
 
        public FarmPublisher() : this( new RemoteCruiseManagerFactory() ) { }
 
        public FarmPublisher( ICruiseManagerFactory factory )
        {
            this.factory = factory;
            this.EnforcerName = Environment.MachineName;
        }
 
        public void Run( IIntegrationResult result )
        {
            // build a list of available farm servers
            //  based off of the plugin configuration
            Dictionary<int, ICruiseManager> servers = new Dictionary<int, ICruiseManager>();
            FindAvailableFarmServers( result, servers );
 
            if( 0 == servers.Count )
            {
                Log.Info( "No servers are available for this project at this time" );
                result.Status = IntegrationStatus.Failure;
                return;
            }
 
            // sort the available servers by priority
            List<int> keys = new List<int>( servers.Keys );
            keys.Sort();
 
            // force a build on the server with the highest 
            //  priority
            ICruiseManager availableServer = servers[ keys[ 0 ] ];
            Log.Info( "forcing build on server ..." );
            availableServer.ForceBuild( result.ProjectName, EnforcerName );
        }
        ...

FarmPublisher is configured with a list of FarmServer objects (lines 20-21).  The Run method (starting on line 31) implements the simple load-balancing algorithm:

  1. a list of farm servers which are available to build the project is constructed (lines 33-36);
  2. if no server is available to build the project, the edge server reports a build failure (line 38-43);
  3. the list of available farm servers is sorted by priority (lines 45-47);
  4. the project build is started on the farm server configured with the highest priority (line 53).

Determining a list of available farm servers is pretty straightforward:

void FindAvailableFarmServers( IIntegrationResult result, IDictionary<int, ICruiseManager> servers )
{
    // predicate to locate a server that isn't actively building 
    // the current project
    Predicate<ProjectStatus> predicate = delegate( ProjectStatus prj )
    {
        return IsServerAvailableToBuildProject( result, prj );
    };
 
    // check the status of each configured farm server
    foreach( FarmServer server in FarmServers.Values )
    {
        ICruiseManager manager = null;
        try
        {
            manager = ( ICruiseManager )factory.GetCruiseManager( server.Uri );
 
            // get a local copy of server's current project status snapshot
            List<ProjectStatus> projects = new List<ProjectStatus>( manager.GetProjectStatus() );
            if( null != projects.Find( predicate ) )
            {
                // add the farm server to the list of available servers, 
                //  keyed by its configured priority
                servers[ server.Priority ] = manager;
            }
        }
        catch( Exception e )
        {
            Log.Warning( e );
        }
    }
}

Available servers are saved in the servers dictionary, keyed by their configured priority.  The availability of each farm server listed in the task configuration is checked by obtaining the status of the farm server's projects, and passing them to the IsServerAvaialbleToBuildProject method:

bool IsServerAvailableToBuildProject( IIntegrationResult result, ProjectStatus prj )
{
    if( null == prj || null == result )
    {
        return false;
    }
    bool status = (          
        // project name must match
        StringComparer.InvariantCultureIgnoreCase.Equals( result.ProjectName, prj.Name ) &&
 
        // integrator status must be "running"
        prj.Status == ProjectIntegratorState.Running &&                
 
        // build activity must be "sleeping"
        prj.Activity.IsSleeping()
    );
    return status;
} 

which simply returns true when:

  • the farm server configuration contains the project,
  • the project is currently running, and
  • the project isn't currently building.

Download

This code is basically spike-quality at this point.  I fully expect to throw this away in favor of something better (or get my manager to splurge for TeamCity).  There's still a lot of stuff to do.  E.g., the algorithm assumes that a farm server is capable of building more than one project at a time - that is, if a farm server is busy building one project, it can still be available to build another concurrently.  My assumption is that I'll manage this with the farm server priority configuration.  I'd like to leverage the queuing features available in CC.NET; however, I see no way of querying the queue status of a farm server in the CC.NET API.  But at least I can start disabling the post-commit hook on our repository.

The project contains the code, a test/demo, and just a few basic unit tests; I'll update the download if/when the project matures.  If you use it, or have any suggestions, please let me know in the comments of this post.  Enjoy!

ccnet.edgeserver.zip (607.87 kb)