Log4Net Recommended Practices pt 2: Isolating Bugs

§ April 21, 2009 04:56 by beefarino |

I realize it's been a while since I've written about logging, but my experiences this morning compelled me to share.  The muse for this post was a very simple bug located in a rat's nest of complexity.

The code in question drives the I/O layer in a device that is filled with proprietary hardware.   There are about three dozen channels of internal communication when this thing is running, which doesn't include any of its networking.  So lots of asynchronous I/O and threading.  All supporting an awesome graphics and physical interface layer that is equally complex on its own.

Except that today, it was a paperweight.  In fact, you couldn't even use the operating system effectively.  Eventually task manager would pop up and show that something in the application was suffocating the CPU.  I was not looking forward to diagnosing this problem given the level of complexity involved.  I expected to spend the morning on it, but in all it took about 10 minutes to find and fix the problem.

I had worked with the application to know what pattern of messages to expect in the log files, but when I opened them I found some of the messages to be missing.  Specifically, the messages pertaining to just one of those three dozen communication channels I mentioned earlier.  After some initial testing, I had isolated the problem to a case where that channel was misconfigured.  I re-ran the application and grabbed the fresh logs.

Since I mirror my class structure in logger objects, isolating the relevant log entries was easy using logparser.  The list was surprisingly short, ending with this message:

initiating PSU polling thread 

I quickly found this message in my code:

void PollForReports()
{
    Log.Info( "initiating PSU polling thread" );
    while( 1 == Interlocked.Read( ref suspendInterlock ) )
    {
        continue;
    }
 
    RequestSerialNumber();
    Log.Debug( "requested PSU serial number ..." );
        
    // ...

As you can see, a "requested PSU serial number..." log message should have been written almost immediately after the last message I found in the log.  Something in the code after the "initiating PSU polling thread" message is written and before the "requested PSU serial number ..." message is locking the CPU. 

Do you see it - the improperly implemented spin lock?  It's pretty obvious when you know where to look, and logging can make it a lot easier to know where to look.



Extension Methods for Deferred Log4Net Message Formatting

§ March 9, 2009 16:07 by beefarino |

In a recent post I described some initial impressions of the MS Live Labs Simple Logging Framework (SLF).  One of the things I really like about SLF is the use of lambas as a way to defer execution of potentially slow or unchecked message formatting code.  I also railed on the idea of creating a completely new effort when log4net is an active open-source project.

So, here I am putting my money where my mouth is: I've created some extension methods for instances of ILog that accept lambdas - now if you use log4net in a C# 3.0+ environment, you can defer message formatting as easily as you can in SLF.  The code is quite simple:

using log4net;
namespace Log4NetExtensions
{
    public static class Log4NetExtensionMethods
    {
        public static void Debug( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsDebugEnabled )
            {
                log.Debug( formattingCallback() );
            }
        }
        public static void Info( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsInfoEnabled )
            {
                log.Info( formattingCallback() );
            }
        }
        public static void Warn( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsWarnEnabled )
            {
                log.Warn( formattingCallback() );
            }
        }
        public static void Error( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsErrorEnabled )
            {
                log.Error( formattingCallback() );
            }
        }
        public static void Fatal( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsFatalEnabled )
            {
                log.Fatal( formattingCallback() );
            }
        }
     }
} 

Each method of the ILog interface gets its own override in the form of an extension method.  The method accepts a Func<string> that allows you to capture log message formatting in a lamba expression, and hence defer its execution until absolutely necessary (or avoid it altogether).  Here are some quick and dirty unit tests to demonstrate the basic use and functionality:

using NUnit.Framework;
namespace Log4NetExtensions.Tests
{
    [TestFixture]
    public class Log4NetExtensionTests
    {
        log4net.ILog Log;
        log4net.Appender.MemoryAppender appender;
        [TestFixtureSetUp]
        public void FixtureSetUp()
        {
            Log = log4net.LogManager.GetLogger( System.Reflection.MethodBase.GetCurrentMethod().DeclaringType );
            appender = new log4net.Appender.MemoryAppender 
            {
                Name = "Memory Appender",
                Threshold = log4net.Core.Level.Debug
            };
            log4net.Config.BasicConfigurator.Configure( appender );                        
        }
        [SetUp]
        public void TestSetUp()
        {
            appender.Clear();            
            log4net.Core.LoggingEvent[] events = appender.GetEvents();
            Assert.That( 0 == events.Length, "failed to clear appender of log events" );
            Log.Logger.Repository.Threshold = log4net.Core.Level.Debug;
        }
        [Test]
        public void LogViaLambdaTest()
        {
            Log.Debug( () => "Hello World!" );
            log4net.Core.LoggingEvent[] events = appender.GetEvents();            
            
            Assert.That( 1 == events.Length, "failed to log via lamba" );
            Assert.That( 
                StringComparer.CurrentCulture.Equals( 
                    "Hello World!", 
                    events[ 0 ].RenderedMessage                   
                ),
                "rendered message does not match (via lambda)"
            );
        }
        [Test]
        public void LogWithLocalVariableReference()
        {
            string value = "World!";
            Log.Debug( () => "Hello " + value );
            log4net.Core.LoggingEvent[] events = appender.GetEvents();
            Assert.That( 1 == events.Length, "failed to log with local variable reference" );
            Assert.That(
                StringComparer.CurrentCulture.Equals(
                    "Hello World!",
                    events[ 0 ].RenderedMessage
                ),
                "rendered message does not match (local variable reference)"
            );
        }
        [Test]
        public void LambdaIsNotEvaluatedAtInactiveLogLevel()
        {            
            Log.Logger.Repository.Threshold = log4net.Core.Level.Error;         
            bool evaluated = false;
            Log.Debug( () => 
                { 
                    evaluated = true;
                    return "Hello World!";
                }
            );
            Assert.That( ! evaluated, "lamba was evaluated at inactive log level" );
            Log.Error( () =>
                {
                    evaluated = true;
                    return "Hello World!";
                }
            );
            Assert.That( evaluated, "lamba was not evaluated at active log level" );
        }
    }
}

Play with it, see if you like it.  I will to.  Any suggestions or comments are encouraged.  If there is enough interest I'll see about submitting the extensions to the apache project.



A Look at the Live Labs Simple Logging Framework

§ March 6, 2009 05:47 by beefarino |

Earlier this week Scott Hanselman tweeted a link to the MS Live Labs Simple Logging Framework (SLF).  I thought I'd check it out and post some comparisons to log4net for posterity.  Note this isn't a performance comparison, just a look at how the frameworks are used.

Spoiler: There are some neat ideas in here.  Check out SLF if you're new to logging, but my guess is that you'll quickly outgrow its capabilities.  If you're already logging, I don't see any reason to switch from more mature frameworks anytime soon.

A Quick Comparison

SLF is remarkably similar to log4net in its design.  SLF uses the concepts of appenders (SLF calls them "logs"), loggers (SLF calls them "loggers" too), and logging levels just like log4net.  In fact, there was virtually no change when porting my "hello world" log4net example to the SLF:

 

using System;
using LiveLabs.Logging;
namespace SimpleLoggingFrameworkExample
{
    class Program
    {
        static void Main( string[] args )
        {
            Logger Log = new Logger( typeof( Program ) );
            Log.Level = LogLevel.Debug;            
            
            Log.Debug( "Hello World!" );  
            Log.Inform( "I'm a simple log4net tutorial." );  
            Log.Warn( "... better be careful ..." );  
            Log.Error( "ruh-roh: an error occurred" );  
            Log.Fatal( "OMG we're dooooooomed!" );
            
            Console.ReadLine();
        }
    }
}

The most annoying change (and the cause of a failed build) was the rename of log4net's ILog.Info method to SLF's Logger.Inform.  Seems silly to me to change that one method name when the rest of the interface is identical....

Simple Formatting Control

SLF defaults to outputting to the process STDERR stream at a log level of Inform.  Line 10 drops the log level to Debug so we can see all of the output, which is formatted rather heinously IMO:

2009-03-06T20:49:57|   |No Thread Name|SimpleLoggingFrameworkExample.Program|Hello World!
2009-03-06T20:49:57|   |No Thread Name|SimpleLoggingFrameworkExample.Program|I'm a simple log4net tutorial.
2009-03-06T20:49:57|   |No Thread Name|SimpleLoggingFrameworkExample.Program|...better be careful ...
2009-03-06T20:49:57|   |No Thread Name|SimpleLoggingFrameworkExample.Program|ruh-roh: an error occurred
2009-03-06T20:49:57|   |No Thread Name|SimpleLoggingFrameworkExample.Program|OMG we're dooooooomed! 

Fortunately SLF provides a straightforward way to alter the message format. Every log object has a MessageFormatter property that can be set to a delegate to format messages for the log. Adding a bit of code to the example:

            Log.Level = LogLevel.Debug;                        
            Log.MessageFormatter = ( msg, log, level ) => String.Format( "{0} [{1}]: {2}", log, level, msg );
            Log.Debug( "Hello World!" );  
makes the output far more readable:
SimpleLoggingFrameworkExample.Program [Debug]: Hello World!
SimpleLoggingFrameworkExample.Program [Inform]: I'm a simple log4net tutorial.
SimpleLoggingFrameworkExample.Program [Warn]: ... better be careful ...
SimpleLoggingFrameworkExample.Program [Error]: ruh-roh: an error occurred
SimpleLoggingFrameworkExample.Program [Fatal]: OMG we're dooooooomed! 

A similar callback exists to format exception objects.  It is certainly easier to achieve this level of formatting control in SLF than it is in log4net, where you have to define custom formatter objects and layout patterns in your configuration. 

Configuration

SLF doesn't require an explicit call to initialize it with a configuration the way log4net does.  As I've pointed out SLF defaults to logging to STDERR at an Inform logging level.

If you do want some control, SLF does offer a Settings class that provides programmatic access to configure defaults and specifics for any of the loggers you've created:

Settings.Default.DefaultLogLevel = LogLevel.Debug; 

However, your options are limited.  You can set a default log level and log for all logger objects, and you can register message and exception formatters.  One glaring omission from the configuration space is the ability to configure default message and exception formatters for all loggers.  Looks like you're stuck setting them individually for now.

Language Feature Support

Another area where SLF improves on log4net is in the use of newer C# language features (log4net is based on the 2.0 version of the language).  Every log method in SLF accepts a lamba expression as an argument:

Log.Debug( () => {
        IDictionary envs = Environment.GetEnvironmentVariables();
        return "Current Environment: " + String.Join( ";", 
            (
                from string k in envs.Keys
                select ( k + "=" + envs[ k ].ToString() )
            ).ToArray() 
        );
    }
);

This is an excellent way to isolate complex logging logic, and to avoid expensive calls to obtain log message formatting arguments when no logging will actually take place; e.g.:

Log.Debug( () => 
    String.Format( "Result: [{0}]", obj.AVeryExpensiveOperation() ) ); 

To achieve the same effect in log4net, you have to explicitly wrap the code in a check of current log level:

if( Log.IsDebugEnabled )
{
    Log.DebugFormat( "Result: [{0}]", obj.AVeryExpensiveOperation() );
}

Personally, I prefer the former.  

Appender Options

This is where log4net beats SLF into a pulp.  log4net ships with a cornucopia of appender options, including the console, files, the event log, the debugger, the .NET trace subsystem, MSMQ, remoting, network endpoints, and various databases.  SLF ships with basically the equivalent of the log4net ConsoleAppender, FileAppender, and RollingFileAppender.

In addition,  SLF offers no built-in decoupling of the log formatting from your logging code the way log4net does.  I've shown in this post how easily you can override the message or exception format on a SLF logger object in your code, but consider also that there is no way to change this message format without changing your code.  E.g., if you someday decide to move from text to an XML log, you have to touch your code.  The log4net formatting configuration can be isolated to the app.config, making it touchable while leaving your code alone.

Summary

To sum up, the major differences between log4net and SLF are:

  • log4net offers far more appender options and configurability.
  • log4net has a larger user-base and more maturity.
  • SLF provides easier access to simple formatting options for messages and exceptions.
  • SLF supports the latest language features, while log4net does not.

I'll be honest - I like some of the ideas I see in the SLF project, but I really can't figure out the impetus for a new project when a mature and open project already exists.  There is this blurb from the project home page on codeplex:

Existing logging libraries are either too feature-light or too complex. That means you might end up spending as much time debugging your logging code as you do your application code. Or you may not even know that your “log” to database is even working until it’s too late. 

Yes, SLF makes a few things easier, but on the whole log4net offers far more for features for the same effort.  And I can't say I've encountered these difficulties with log4net - I don't consider log4net complex (in comparison to, say, the MS Enterprise Logging Framework), but that's probably because I've been using log4net for years. So let's say for argument's sake that log4net is overly complex and difficult to use.  The question at the front of my mind is this:

Why start over when you can openly contribute to the log4net project and make it better?

There is nothing earth-shaking in SLF that I see mandating a new project effort.  And by not contributing to log4net and starting over, you're dismissing the features and maturity it has to offer.

[insert "MS hubris!  They don't 'get' the open source thing...."  comment here]



Correlating and Grepping Logs with PowerShell

§ November 6, 2008 16:44 by beefarino |

I've spent the last few days doing a post-mortem on a colossal system failure at a client site.  We have plenty of data on the failure, in the form of log4net application logs and Windoze event logs, but the amount of data at our disposal is making interpreting and correlating those logs a gignormous task.  Some of the logs are XML, others are CSV, so making correlations between them has involved searching multiple spreadsheets and text files for information manually.  Correlating across the logs is made even more difficult as the event log export contains datetime instants that are local to the machine, whereas the log4net logs contain UTC timestamps.

So what do I do when presented with such a Gordian Knot?  I select a katana sharp enough to shred the knot into pieces.  Lately my weapon of choice for this type of problem has been powershell.  My goal is to be able to correlate events by time and find errors or warnings across all of these logs.  To that end, I'll be using some powershell Shikai to create object hierarchies out of XML and CSV data.  In addition, I'll release powershell Bankai and extend those objects with custom properties that allow both types of logs to be combined and searched as if they were a single log.

Importing Log4Net XML Log Files

My application logs use the XML formatter built-in the standard log4net distribution, which produces a partial XML document (as described here) similar to the following:

<log4net:event logger="MyApp" timestamp="2008-11-04T04:42:42.2612828-07:00" level="INFO">
    <log4net:message>validating xml digital signature on document</log4net:message>
</log4net:event>
<log4net:event logger="MyApp" timestamp="2008-11-04T04:42:43.1279382-07:00" level="INFO">
    <log4net:message>digital signature has passed validation</log4net:message>
</log4net:event>
...

Note there is no root element containing the <log4net:event /> elements, so the import script does a little fernagling to make the log appear to be a valid XML document:

# import-logxml
#
# imports a log4net application log file
#
# file: path to xml log file
#

param( $file )

# surround the log file contents with a root element
$xml = '&lt;log xmlns:log4net="uri:doesnotmatter"&gt;' + ( gc $file ) + '&lt;/log&gt;';

# load the log as XML and slurp out every log event
( [xml] $xml ).log.event |

    # add a note property indicating this log event is from an application log file
    add-member noteproperty -name logsource -value "applicationlog" -passthru |        
        
    # add a datetimeoffset property representing the instant of the log event
    add-member scriptproperty -name datetime -value {
            [DateTimeOffset]::Parse( $this.timestamp );
        } -passthru;

On line 11, contents of the log file are surrounded with a root element to make the log file valid XML.  Line 14 uses powershell's built-in [xml] datatype to translate the XML string into an object hierarchy, then retrieves every log event from the log file.  Each <log4net:event /> element from the XML log file is represented by an object containing properties corresponding to the node's attribute values and child elements.

The next few lines use powershell's add-member cmdlet to extend the log entry objects with new properties. Line 17 adds a simple property named "logsource" with a static value indicating that this object was created from an application log.  Line 20 adds a calculated property named "datetime" containing the instant of the log message as a System.DateTimeOffset structure.  Having this DateTimeOffset property will give me an appropriately-typed value against which I can compare, filter, and sort log entry timestamps.

Using the import-logxml script is straightforward:

> $log = import-logxml 'myapp.log.xml'

After running the script, the $log variable contains an array of objects representing the individual log events in the XML log file:

> $log[ 0 ]
datetime          : 11/04/2008 4:42:42 AM -07:00
logger            : MyApp
timestamp         : 2008-11-04T04:42:42.2612828-07:00
level             : INFO
thread            : 5076
message           : validating xml digital signature on document

Now powershell's filtering capabilities can be used to isolate log events; e.g., by severity level:
> $log | ? { $_.level -eq 'error' }
datetime          : 11/04/2008 4:44:16 AM -07:00
logger            : MyApp
timestamp         : 2008-11-04T04:44:16.8272804-07:00
level             : ERROR
thread            : 5076
message           : an exception was raised while processing the application configuration
...

Importing Event Log CSV Files

Importing the CSV event log data is a little more involved:

# import-eventlogcsv.ps1
#
# translates an event log export from
# CSV format to an object collection
#
# file: path to CSV export of event log
# offset: a timespan indicating the UTC offset
#     of the event log source
#

param( $file, [TimeSpan] $offset = [DateTimeOffset]::Now.Offset )

# add CSV column headers, if necessary
convert-eventlogcsv $file;

# import the CSV file
import-csv $file |

    # add a note property indicating this entry is from an event log
    add-member noteproperty -name logsource -value "eventlog" -passthru |
    
    # add a note property to store the value of the UTC offset
    add-member noteproperty -name offset -value $offset  -passthru |
    
    # add a datetimeoffset property representing the instant of the log entry
    add-member scriptproperty -name datetime -value {
            new-object DateTimeOffset
                ( [DateTime]::Parse( $this.date + " " + $this.instant ),
                  $this.offset
                );
        } -passthru |
        
    # add a property that translates the event log message type
    #    into a log4net log-level value
    add-member scriptproperty -name level -value {
            switch -re ( $this.eventtype  )
            {                
                "info" { "INFO" }
                "succ" { "INFO" }
                "fail" { "INFO" }
                "warn" { "WARN" }
                "error" { "ERROR" }
            }
        } -passthru

Event log timestamps are expressed in the time zone of the source computer, and in my particular case that source computer exists in a different time zone than my laptop.  So the import script accepts a parameter containing a UTC offset to indicate the time zone of the event log source computer (line 11).  

Importing CSV data into powershell is as simple as invoking the import-csv cmdlet (line 14); however, the import-csv cmdlet assumes the first row in the file contains column headers.  The event log exports no such header row, so one must be added.  I've wrapped up the process of adding this header row into a script (convert-eventlogcsv.ps1) that is included in the download for this post; this script is invoked on line 14.

The result of the import-csv cmdlet is an object array.  Each object in the array represents a single row of the CSV, with properties corresponding to each column in the row.  As with the application logs, several properties are added to each object using powershell's add-member cmdlet:

  • a "logsource" property indicating that the log event was read from an event log export;
  • an "offset" property containing the UTC offset passed into the script;
  • a "datetime" property that creates a DateTimeOffset structure representing the instant the log message was produced;
  • a "level" property that translates the event log entry type (Information, Warning, Error, etc. ) into strings matching the standard log4net log severity levels (DEBUG, INFO, WARN, ERROR, and FATAL).

Script use is straightforward:

> $evt = import-eventlogcsv 'eventlog.system.csv' ( [TimeSpan]::FromHours( -7 ) )

With the result similar to the log4net import:

> $evt[ 0 ]
date      : 11/04/2008
instant   : 4:42:42 AM
source    : MyApp
eventtype : Information
category  : None
eventid   : 2
user      : N/A
computer  : PPROS1
message   : application has started
logsource : eventlog
offset    : -07:00:00
datetime  : 11/04/2008 4:42:42 AM -07:00
level     : INFO

Correlating Events Across Logs

Now that the import scripts are available, correlating the logs is pretty simple.  First, we need to import our event and application logs:

> $log = import-logxml 'myapp.log.xml'
> $evt = import-eventlogcsv 'eventlog.system.csv' $log[ 0 ].datetime.offset

Note that the log4net log is imported first; this allows the time zone offest information it contains to be used to adjust the event log timestamps.  Once the two logs are imported, we can combine them using powershell's array concatenation operator:

> $data = $log + $evt

At this point, the $data variable contains the merged contents of the event log and application log.  The add-member cmdlet was used to extend these objects with custom properties during the import; now every merged log entry contains a logsource, datetime, and level property with identical types and semantics.  This means that we can now search, sort, and filter the combined logs using a single set of criteria.  For example, consider the following:

> $start = [DateTimeOffset]::Parse( "11/04/2008 4:40 AM -7:00" )
> $end = [DateTimeOffset]::Parse( "11/04/2008 4:45 AM -7:00" )
> $data | ? { $_.datetime -gt $start -and $_.datetime -lt $end }

which finds all events that occurred between 4:40 and 4:45 AM.  Or this:

> $data | ? { $_.level -eq 'error' -and $_.datetime -gt $start -and $_.datetime -lt $end }

which isolates all error-level events during the same time period.  Or this:

> $data | sort datetime

which outputs all log entries sorted by the time they were written.

Exporting the Correlated Logs

While powershell's interactive filtering capabilities are useful, you may find you want to export the newly merged logs for further analysis in a spreadsheet or other tool.  This can be easily accomplished using the export-csv cmdlet:

> $data | export-csv 'mergedlog.csv' -notypeinformation

Download the Code

The import scripts are available here: merginglogswithpowershell.zip (1.49 kb).

Enjoy!