Death by Logging Mantra #1 - Logs Consume Space

§ June 15, 2009 18:03 by beefarino |

As much as I love and lean on logging, it's not immune from problems.  In fact, it can be the source of some serious headaches.  A recent snafu at work prompted me to write about some situations where logging can bring your application performance to a screeching halt, or crash your application altogether. 

Here's what happened...

An incredibly complex system of hardware and software had been running smoothly for months; as part of an instrumentation layer we opted to change the rolling log strategy from 50 10MB text files:

...
<appender name="TextFile" type="log4net.Appender.RollingFileAppender">
    <file value="logs\log.txt" />
    <appendToFile value="true" />
    <rollingStyle value="Size" />
    <maxSizeRollBackups value="50" />
    <maximumFileSize value="10MB" />
    <layout type="log4net.Layout.XMLLayout">
      <prefix value="" />
    </layout>
</appender>
... 

to 500 1MB xml files:

...
<appender name="XmlFile" type="log4net.Appender.RollingFileAppender">
    <file value="logs\log.xml" />
    <appendToFile value="true" />
    <rollingStyle value="Size" />
    <maxSizeRollBackups value="500" />
    <maximumFileSize value="1MB" />
    <staticLogFileName value="false" />
    <countDirection value="1" />
    <layout type="log4net.Layout.XMLLayout">
      <prefix value="" />
    </layout>
</appender>
... 

As an ardent log4net user, I am aware of the performance impact of rolling a large number of files - if the CountDirection setting of the RollingFileAppender is less than 0 (which it is by default), the system will rename every log file each time the log rolls over.  This is costly, and in our product configuration that would mean up to 500 file renames on each roll. 

"Good thing I know what I'm doing...."

Several hours after firing up the new configuration a college asked me to come look at the device.  It had slowed to a wounded crawl.   I went to dig into the logs - I popped over to the log drive and started to find the most recent one.

... but there were 2000 log files, not 500.  The 2GB drive dedicated to the log was completely full.  And the application was still trying to write new log entries, which meant a slew of IO Exceptions were being continuously thrown by multiple threads.

"Um, I think I may have found the problem..."

I removed the oldest 1999 log files and the device immediately recovered.

So what happened?

The configuration XML is 100% correct.  The problem was that I accidentally deployed the software to the device with an old beta version of log4net 1.2.9; that particular version contains a bug in the RollingFileAppender code that prevents the MaxSizeRollBackups from being honored when CountDirection was >= 0.  Without the logs being limited in number, the software eventually filled up the hard disk with log entries.

Which bring me to my first death-by-logging mantra...

Logs Consume Space

It sounds silly I know, but this is the single most prevalent antipattern I see with any logging implementation.  There is a finite amount of storage space, and you need to make sure your logging activity doesn't consume more than its share.

I frequently see this when apps use FileAppender - this beast has no chains and, as I've stated elsewhere, you should never ever use it.  Ever.  Even in "little" applications it can cause massive trouble because the log accumulates over process time or application runs with no checks.  I've seen a 1KB application with 3GB of logs spanning almost a year of activity.

But don't think the problem is limited to the file-based appenders.  Remember....

  • memory is finite...
  • event logs fill up more often than you think...
  • a database can be configured to boundlessly expand as needed...
  • mail servers generally put caps on the size of an inbox...

Whichever appender strategies you choose, you should carefully consider the following:

... How much persistent storage are you allocating to the log?  Your answer should be a firm number, like "500MB", and not "the rest of the disk".  If you can, base this on the amount of information you need to have access to.  If a typical run of your application results in 10MB of logs, you can base the allocated size on the number of runs you want to persist.  If your application runs continuously - like a web site or windows service - you can plot out the logging activity over time, then gauge the allocation size from the time span of activity you want to persist.

... How will you cope when the allocated storage is used up?  Some appenders, like the RollingFileAppender, can handle this for you by freeing up space used by old log entries.  Others, like the EventLogAppender or the AdoNetAppender, blithely log without regard to the amount of space being consumed, and it's up to you to manage the size of the log in other ways.  E.g., I've seen SQL jobs dedicated to removing log records older than N hours, or truncating the log table to the N newest records.

... What happens when you log to a full repository?  Plan for success, but understand the causes of failure.  As I recently learned, our application slows down significantly when the log drive is full, so now checking the free space of the log drive is now included in our troubleshooting guide as a checklist item under "Application Performance Issues".  Test your application under limited logging resources to understand how it will behave.

The most important thing to remember is that logging, like any other subsystem of your application, needs to be planned, tested, and verified.



Log4Net Tutorials and Resources

§ June 7, 2009 03:57 by beefarino |

I've been meaning to create a landing page for all the log4net traffic on my blog.  A friend of mine recently told me how difficult it is to drill through all the tutorials, so I decided to organize access to all the log4net stuff in one place.  I'll keep this page updated as the blog grows. 

If there are any log4net resources you think I should add to this list or blog about, please add them in the comments or send them to me via the contact link.

Log4Net Tutorials

If you've never used log4net before, you should begin with the getting started and configuration topics.  You'll want to review the appender, layout, and patterns tutorials next, and use the others as you see fit.

  • Getting Started with Log4Net - never used log4net before, but need to get running quickly?
  • Basic XML Configuration - describes how to isolate log4net configuration to an XML file for easier modification.
  • Appenders - a discussion of the catalog of appenders available in the log4net distribution. Need to figure out how to log to the console? Or a text file? Or a database? This tutorial is for you.
  • Layouts and Patterns - want more control over the way your log looks?
  • Using Logger Objects - learn to use logger objects to organize your logging code into independently configurable parts.
  • Log Event Context - need some extra data in that log message? Maybe a call stack label, user identifier, or session moniker?
  • Filters - learn how to tame that log output and filter out the stuff you don't want or need.
  • Lossy Logging - learn about an advanced technique of optimizing your logging activity. Lossy logging gives you the benefit of a verbose log with the performance of a minimal logging scenario.

Recomended Practices

  • Your Code - learn what you should do in your code to make the most of logging.
  • Isolating Bugs - see how logging can be used to isolate production bugs.

Death by Logging Mantras

These mantras discuss specific scenarios where logging can cripple your application or its performance.  These are in-process and links will become available as I write the posts.

Resources



Log4Net Recommended Practices pt 2: Isolating Bugs

§ April 21, 2009 04:56 by beefarino |

I realize it's been a while since I've written about logging, but my experiences this morning compelled me to share.  The muse for this post was a very simple bug located in a rat's nest of complexity.

The code in question drives the I/O layer in a device that is filled with proprietary hardware.   There are about three dozen channels of internal communication when this thing is running, which doesn't include any of its networking.  So lots of asynchronous I/O and threading.  All supporting an awesome graphics and physical interface layer that is equally complex on its own.

Except that today, it was a paperweight.  In fact, you couldn't even use the operating system effectively.  Eventually task manager would pop up and show that something in the application was suffocating the CPU.  I was not looking forward to diagnosing this problem given the level of complexity involved.  I expected to spend the morning on it, but in all it took about 10 minutes to find and fix the problem.

I had worked with the application to know what pattern of messages to expect in the log files, but when I opened them I found some of the messages to be missing.  Specifically, the messages pertaining to just one of those three dozen communication channels I mentioned earlier.  After some initial testing, I had isolated the problem to a case where that channel was misconfigured.  I re-ran the application and grabbed the fresh logs.

Since I mirror my class structure in logger objects, isolating the relevant log entries was easy using logparser.  The list was surprisingly short, ending with this message:

initiating PSU polling thread 

I quickly found this message in my code:

void PollForReports()
{
    Log.Info( "initiating PSU polling thread" );
    while( 1 == Interlocked.Read( ref suspendInterlock ) )
    {
        continue;
    }
 
    RequestSerialNumber();
    Log.Debug( "requested PSU serial number ..." );
        
    // ...

As you can see, a "requested PSU serial number..." log message should have been written almost immediately after the last message I found in the log.  Something in the code after the "initiating PSU polling thread" message is written and before the "requested PSU serial number ..." message is locking the CPU. 

Do you see it - the improperly implemented spin lock?  It's pretty obvious when you know where to look, and logging can make it a lot easier to know where to look.



Extension Methods for Deferred Log4Net Message Formatting

§ March 9, 2009 16:07 by beefarino |

In a recent post I described some initial impressions of the MS Live Labs Simple Logging Framework (SLF).  One of the things I really like about SLF is the use of lambas as a way to defer execution of potentially slow or unchecked message formatting code.  I also railed on the idea of creating a completely new effort when log4net is an active open-source project.

So, here I am putting my money where my mouth is: I've created some extension methods for instances of ILog that accept lambdas - now if you use log4net in a C# 3.0+ environment, you can defer message formatting as easily as you can in SLF.  The code is quite simple:

using log4net;
namespace Log4NetExtensions
{
    public static class Log4NetExtensionMethods
    {
        public static void Debug( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsDebugEnabled )
            {
                log.Debug( formattingCallback() );
            }
        }
        public static void Info( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsInfoEnabled )
            {
                log.Info( formattingCallback() );
            }
        }
        public static void Warn( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsWarnEnabled )
            {
                log.Warn( formattingCallback() );
            }
        }
        public static void Error( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsErrorEnabled )
            {
                log.Error( formattingCallback() );
            }
        }
        public static void Fatal( this ILog log, Func<string> formattingCallback )
        {
            if( log.IsFatalEnabled )
            {
                log.Fatal( formattingCallback() );
            }
        }
     }
} 

Each method of the ILog interface gets its own override in the form of an extension method.  The method accepts a Func<string> that allows you to capture log message formatting in a lamba expression, and hence defer its execution until absolutely necessary (or avoid it altogether).  Here are some quick and dirty unit tests to demonstrate the basic use and functionality:

using NUnit.Framework;
namespace Log4NetExtensions.Tests
{
    [TestFixture]
    public class Log4NetExtensionTests
    {
        log4net.ILog Log;
        log4net.Appender.MemoryAppender appender;
        [TestFixtureSetUp]
        public void FixtureSetUp()
        {
            Log = log4net.LogManager.GetLogger( System.Reflection.MethodBase.GetCurrentMethod().DeclaringType );
            appender = new log4net.Appender.MemoryAppender 
            {
                Name = "Memory Appender",
                Threshold = log4net.Core.Level.Debug
            };
            log4net.Config.BasicConfigurator.Configure( appender );                        
        }
        [SetUp]
        public void TestSetUp()
        {
            appender.Clear();            
            log4net.Core.LoggingEvent[] events = appender.GetEvents();
            Assert.That( 0 == events.Length, "failed to clear appender of log events" );
            Log.Logger.Repository.Threshold = log4net.Core.Level.Debug;
        }
        [Test]
        public void LogViaLambdaTest()
        {
            Log.Debug( () => "Hello World!" );
            log4net.Core.LoggingEvent[] events = appender.GetEvents();            
            
            Assert.That( 1 == events.Length, "failed to log via lamba" );
            Assert.That( 
                StringComparer.CurrentCulture.Equals( 
                    "Hello World!", 
                    events[ 0 ].RenderedMessage                   
                ),
                "rendered message does not match (via lambda)"
            );
        }
        [Test]
        public void LogWithLocalVariableReference()
        {
            string value = "World!";
            Log.Debug( () => "Hello " + value );
            log4net.Core.LoggingEvent[] events = appender.GetEvents();
            Assert.That( 1 == events.Length, "failed to log with local variable reference" );
            Assert.That(
                StringComparer.CurrentCulture.Equals(
                    "Hello World!",
                    events[ 0 ].RenderedMessage
                ),
                "rendered message does not match (local variable reference)"
            );
        }
        [Test]
        public void LambdaIsNotEvaluatedAtInactiveLogLevel()
        {            
            Log.Logger.Repository.Threshold = log4net.Core.Level.Error;         
            bool evaluated = false;
            Log.Debug( () => 
                { 
                    evaluated = true;
                    return "Hello World!";
                }
            );
            Assert.That( ! evaluated, "lamba was evaluated at inactive log level" );
            Log.Error( () =>
                {
                    evaluated = true;
                    return "Hello World!";
                }
            );
            Assert.That( evaluated, "lamba was not evaluated at active log level" );
        }
    }
}

Play with it, see if you like it.  I will to.  Any suggestions or comments are encouraged.  If there is enough interest I'll see about submitting the extensions to the apache project.