Expressing Filter Queries as XML in SQL Server

§ January 5, 2009 17:41 by beefarino |

I spent the weekend mired in a post-mortem covering four hours of erratic system behavior at a client.  I used LogParser to normalize and slurp the 400,000+ log entries from redundant application servers into a single SQL database table for analysis, and hacked together a visualization tool based on the SIMILE Timeline project.  I started piecing things together by plotting little color-coded stories on the timeline, and I quickly got to the point where I needed to be able to build complex and dynamic queries against the log data.  For example:

  • When did server X fail over to server Y?
  • What errors were logged each time X failed over to Y?
  • What order were services stopped and started whenever server X failed over to server Y?
  • What errors were logged by service A on server X and service B on server Y between 6:15 and 6:40AM on 1-3-2009?

The filter criteria kept growing incrementally larger and more complex.  Looking at the simple schema of the data table created by LogParser:

capturing the various filter combinations in stored procedures wouldn't have been hard, but trying to cover all of the possibilities would have taken a bit of work and distracted me from the task at hand.  I'm no SQL maven, and I really wasn't sure how complicated my filtering would need to become to build the stories I wanted to see.

Thankfully, my bag of tricks contains a SQL technique that allows me to filter a data set on an arbitrarily complex set of criteria expressed as XML.  For example, instead of writing SQL to query the data directly:

select *
from logentries
where (
    eventtype = 'error' and
    hostname = 'server X'
);

I can express the same thing as a simple XML document and pass it to a stored procedure:

<filters>
    <filter>
        <eventtype>
            error
        </eventtype>
        <hostname>
            server X
        </hostname>
    </filter>
</filters>

The stored procedure is capable of processing any combination of filter vectors, and will process multiple <filter /> criteria in a single pass.  This allows for more complex filters to be defined as collections of simple <filter /> elements.  For instance, this query:

 

select *
from logentries
where (
    eventtype = 'error' and
    (
        ( hostname = 'server X' and source = 'service A' ) or
        ( hostname = 'server Y' and source = 'service B' )
    ) and
    localinstant between
        '1-3-2009 6:15' and '1-3-2009 6:45'
);

 

is expressed by this XML:

<filters>
    <filter>
        <eventtype>
            error
        </eventtype>
        <hostname>
            server X
        </hostname>
        <source>
            service A
        </source>
        <start>
            1-3-2009 6:15
        </start>
        <end>
            1-3-2009 6:45
        </end>
    </filter>
    <filter>
        <eventtype>
            error
        </eventtype>
        <hostname>
            server Y
        </hostname>
        <source>
            service B
        </source>
        <start>
            1-3-2009 6:15
        </start>
        <end>
            1-3-2009 6:45
        </end>
    </filter>
</filters>

Here's the magical stored procedure:

create procedure [dbo].[sp_SelectLogTimelineFromXMLFilter](
    @filterXml nText
)
as
-- procedure result
declare @result int;
set @result = 0;
-- XML doc handle
declare @hDoc int;
-- sproc error code
declare @err int;
-- parse XML document
exec @err = sp_xml_preparedocument @hDoc output, @filterXml;
if( 0 < @err )
begin
    raiserror(
        'Unable to prepare XML filter document',
        10,
        @err
    );
    return 1;    
end
select e.*, x.label, x.color
from logentries as e
inner join openxml(
    @hDoc,
    '/filters/filter',
    1
)
with(
    hostname varchar( 256 ) './hostname',
    useridentity varchar( 256 ) './useridentity',    
    eventtype varchar( 256 ) './eventtype',
    eventid int './eventid',
    source varchar( 256 ) './source',
    [message] varchar( 256 ) './message',
    startinstant datetime './start',
    endinstant datetime './end',
    label varchar( 256 ) './@label',
    color varchar( 32 ) './@color'
) as x on(
    ( x.hostname is null or
        e.hostname like x.hostname ) and
    ( x.useridentity is null or
        e.useridentity like x.useridentity ) and
    ( x.eventtype is null or
        e.eventtype like x.eventtype ) and
    ( x.eventid is null or
        e.eventid = x.eventid ) and
    ( x.source is null or
        e.source like x.source ) and
    ( x.category is null or
        e.category like x.category ) and
    ( x.[message] is null
        or e.[message] like x.[message] ) and
    ( x.startinstant is null or
        e.localinstant between
            x.startinstant and x.endinstant )
)
order by localinstant;
-- unload XML document
exec @err = sp_xml_removedocument @hDoc;
-- NOTE: bug in sp_xml_removedocument on SQL 2000
--    causes it to return 1 on success, not 0
--     fixed in SQL 2005
if( 1 < @err )
begin
    raiserror(
        'Unable to remove filter XML document',
        10,
        1
    );
    return 1;
end
return @result;

The procedure accepts an XML document in its @filterXml parameter.  The magic starts on line 31, where the logentries table is inner-joined against each <filter /> element in the XML document.  The with() clause starting on line 36 defines how the filter XML maps onto fields for the join operation using XPath expressions.  For instance, line 43:

 

    startdate datetime './start',

 

defines a field named startdate of type datetime mapped to the <start /> child element of the <filter /> element.

Line 47 starts the definition of the join criteria.  If the criteria value is null - or unspecified in the XML document - all results for that criteria are returned.  Otherwise, the value supplied in the filter XML is used to limit the data returned from the query.  In addition, for varchar() fields I use the LIKE operator for a bit of flexibility.

Using XML to define the filter criteria has a few immediate benefits to my spelunking efforts:

  • Building XML from HTTP request data is a breeze.  So enabling the timeline to display dynamic sets of log data took little more than adding form elements to the timeline web page.
  • A single story containing multiple time ranges and arbitrarily complex filter vectors can be captured in a single XML document.  Rather than having to make multiple queries to obtain all of the data necessary to define a timeline, the XML document is processed in a single query.
  • The XML document defining a timeline can be persisted to a file or database table.  This made event timelines easy to build, save, reload, and edit.

In short, this technique saved me from writing a bunch of one-off SQL and allowed me to focus on making sense of the data.



Resolution 2009: Be a Beacon, Bypass the Bugzappers

§ December 31, 2008 06:39 by beefarino |

Being the last day of 2008, I thought I would share my professional resolutions for 2009.  First, let me share with you what someone recently said to me during a discussion about leadership:

"Every person shines a light for others to see, and they choose whether that light is a beacon or a bugzapper."

He went on to explain that a beacon uses their experience, passion, and knowledge to to make people better.  A bugzapper uses their experience, passion, and knowledge to bring people down, to make themselves feel bigger by making others feel smaller. 

Now that we're talking the same language, my professional resolution for 2009 is to be a better beacon.  My goal is to be more successful at beaconage by trying some of the following things...

get out of my silo 

My team works fairly independently, along clearly delineated sandboxes defined my expertise.  I hate this - I'm not an expert in anything, so rather than feel useful I feel pigeon-holed, and my projects are usually so isolated that I end up working on them alone.  The work is stagnant, usually off-the-backlog, and I never feel like a "real" contributor.  So, I'm going to start finding ways to help out in other areas of the system, and spiking some projects ideas that touch areas outside of my sandbox. 

I'm not sure how I'll make this happen or how welcome my presence will be, but our team has become a skeleton crew steering a galleon, so I think I can make a few opportunities for myself ...

try more, do more, write more

I feel like I should accomplish more than I do, both professionally and personally.  I'm going to try some modest time management tactics and see how much more stuff I can get done.  E.g., I'm trying out maintaining my away-from-work projects and spike ideas using a backlog, and managing their execution with sprints in the hope that I don't pitter away my free time on endless and unnecessary polish.  

Moreover, I want to update this blog more frequently with details of these projects and spikes.  I love to write, and this blog my primary outlet for it.  

smile more, talk more

I know it sounds silly, but I'm starting to understand that to most people a 6-foot 2-inch 200+ pound barrel-chested dude with hair turning to dreds halfway down his back is scary-looking.  And I'm a thinker, so I tend to be pretty quiet unless I have questions.  The combination can come off ... unwelcoming.  So, I'm going to try to wear a smile and chime in more by default, and see what that gets me.  

cope with the bugzappers

This is, without a doubt, going to be the hardest part of my resolution.  There are (and always will be) people chomping at the bit to shit all over my efforts, and I tend to take that rather personal.  I never enjoy conflict, but I understand conflict is necessary to improve myself or others.  However, being brow-beaten, chided, degraded, or ignored frustrates me to no end and accomplishes nothing beyond making me want to kick the bugzapper in the sack.

That is my biggest obstacle at this point: coping with the bugzappers in a way that doesn't turn me into one of them.

Suggestions welcome.



dotnetkicks counter image service

§ December 7, 2008 13:56 by beefarino |

I noticed late last week that the dotnetkicks image service is failing to respond to requests containing URLs for unkicked stories.  Not sure when this behavior started, or if it's newly defined behavior, but I've submitted issue 233 through the dotnetkicks google code bug tracking system.

If you're noticing this issue as well, please star the bug (and kick this story).

Update

thanx to gavinjoyce, this issue appears to be fixed as of 2008 Dec 08, 19:30 UTC.



what my words say about me

§ December 3, 2008 10:35 by beefarino |

I stumbled on an interesting service called Typealyzer that attempts to fit a person into one of the Myers-Briggs profiles based on the writing in their blog.  Based on my blog, it reports me as INTJ - Scientist, or:

  • Introverted (vs extroverted)
  • iNtuitive (vs sensing)
  • Thinking (vs feeling)
  • Judging (vs perceiving)

From my Typealyzer result:

"The long-range thinking and individualistic type. They are especially good at looking at almost anything and figuring out a way of improving it - often with a highly creative and imaginative touch. They are intellectually curious and daring, but might be pshysically[sic] hesitant to try new things.

The Scientists enjoy theoretical work that allows them to use their strong minds and bold creativity. Since they tend to be so abstract and theoretical in their communication they often have a problem communcating their visions to other people and need to learn patience and use conrete examples. Since they are extremly good at concentrating they often have no trouble working alone."

This seems accurate with how I would assess myself; I found more detail on being INTJ here, along with details on the other 15 profiles. 

I thought it would be interesting to see what types of writing I'm drawn to.  So on a larf I ran through a short list of my favorite blogs and found the service to hit the mark rather well: not as many INTJs as I would have thought, a few ENTJs and even one ESTP, but not a single F in the bunch.  Not sure what THAT says about me ....

So, what profile are you?