There is No Homunculus in your Software

§ January 12, 2009 06:52 by beefarino |

I want to refactor some code to a strategy pattern to isolate some complex authentication procedures, rather than have hard-coded behavior that prevents unit, load, and performance testing.  I just had a very frustrating discussion about it ...  

Developers talk about their code as if it were people all the time.  I do it, you do too.  Most of the time it's harmless, as in this little quip I said this morning:

"So when the service wants to authenticate the client, it will ask whatever authentication strategy is available."

Everyone in the conversation obviously knows that the application isn't capable of wanting or asking, and that these are metaphors to hand-wave over the technicalities of calling on an object.  No harm done; but consider the reply:

"But how would the service know that the client was really authenticated unless it does the work itself?"

Both examples show a kind of anthropomorphism ( like when your car won't start because it's upset with you for not changing its oil, or that your laptop doesn't like the couch because it won't connect to your wireless network there ).  Wants and asks describes dependencies and actions of the software - in short, the software's behavior.  Know attributes a conscious state to the software - that it somehow can understand the difference between doing some work and calling someone else to do the same work, which is nothing short of a direct demotion of a functional requirement ( the service MUST authenticate all clients ) into a software implementation ( this class MUST contain all authentication code ).

There is no homunculus in the software.  There is no little dude sitting in a virtual Cartesian Theater watching the bits fly by and getting upset when things don't happen the way he thinks they should.

Software doesn't know anything.  Software does whatever it is told to do.  If the service contains the authentication code, or if it delegates to a strategy that performs the same action, the behavior is the same.  Given that, if the former is prohibitive to testing, I say do the latter and test my heart out!!




Accountability and Software Development

§ January 7, 2009 01:26 by beefarino |

Jay Field today posted The Cost of Net Negative Producing Programmers, where he describes his view of how the software development industry fosters NNPPs via overpromotion and a lack of accountability.  A good read, with some good points, especially about the total cost of keeping NNPPs on a software team.  However, I don't agree that developers are never held to the same legal and financial accountability as, say, doctors or lawyers.  Consider industries I've worked in my career....

Education: producing software that exposes certain identity information of a public school student, such as their SSN, is considered a crime in some states.

Gaming: code up software for a slot machine with a hidden a payout indicator (like a pixel that turns red when the machine is ready to pay out) and you win a nickel in the joint.

Medical: a programmer can be held liable in civil court if the software controlling a medical device causes an accidental death.  A friend of mine likes to relay the story of a programmer who skirted civil liability by proving that a third-party software library had mishandled a divide-by-zero error that ended up killing three people with radiation poisoning before the problem was noticed.  

And the coup de grâce: as a U.S. defense contractor, if you produce software that leaks secret information to an enemy of the state, you could be charged with treason.  If the enemy leverages the information in a way that results in the death of an American citizen, you become eligible for the death penalty.

You're probably thinking that my examples are obtuse, rare or from highly-regulated industries.  So the lack of accountability is isolated to business software...  

Think again.  Consider that some states, like Oregon, have inane "anti-hacker" laws that effectively elevate corporate IT policy into state law.  Just ask Randal Schwartz about it (that's his mug along-side this post, BTW).  In these states, someone who writes code that even attempts to access what ACME considers "sensitive data" could go to prison and pay massive fines, even when ACME chooses to place that sensitive data on a public network.  If you've ever had to wade through the corporate IT security policy for a large corporation, you understand why this is so freaking retarted and should scare you out of your khakis.

All that said, I do agree with Jay in that I believe the software industry overall to be a prima donna when it comes to liability and accountability.  In most industries when you produce a product that fails to meet its intended purpose, you can be held accountable.  E.g., when ACME makes a can opener that fails to open cans, they can be held accountable for the cost to the end user.  Software is the only industry I can think of where you can produce a product that is purchased by the end-user to meet a purpose, and then force the end-user to agree to a usage license that alleviates you from having to meet that purpose at all.



Expressing Filter Queries as XML in SQL Server

§ January 5, 2009 17:41 by beefarino |

I spent the weekend mired in a post-mortem covering four hours of erratic system behavior at a client.  I used LogParser to normalize and slurp the 400,000+ log entries from redundant application servers into a single SQL database table for analysis, and hacked together a visualization tool based on the SIMILE Timeline project.  I started piecing things together by plotting little color-coded stories on the timeline, and I quickly got to the point where I needed to be able to build complex and dynamic queries against the log data.  For example:

  • When did server X fail over to server Y?
  • What errors were logged each time X failed over to Y?
  • What order were services stopped and started whenever server X failed over to server Y?
  • What errors were logged by service A on server X and service B on server Y between 6:15 and 6:40AM on 1-3-2009?

The filter criteria kept growing incrementally larger and more complex.  Looking at the simple schema of the data table created by LogParser:

capturing the various filter combinations in stored procedures wouldn't have been hard, but trying to cover all of the possibilities would have taken a bit of work and distracted me from the task at hand.  I'm no SQL maven, and I really wasn't sure how complicated my filtering would need to become to build the stories I wanted to see.

Thankfully, my bag of tricks contains a SQL technique that allows me to filter a data set on an arbitrarily complex set of criteria expressed as XML.  For example, instead of writing SQL to query the data directly:

select *
from logentries
where (
    eventtype = 'error' and
    hostname = 'server X'
);

I can express the same thing as a simple XML document and pass it to a stored procedure:

<filters>
    <filter>
        <eventtype>
            error
        </eventtype>
        <hostname>
            server X
        </hostname>
    </filter>
</filters>

The stored procedure is capable of processing any combination of filter vectors, and will process multiple <filter /> criteria in a single pass.  This allows for more complex filters to be defined as collections of simple <filter /> elements.  For instance, this query:

 

select *
from logentries
where (
    eventtype = 'error' and
    (
        ( hostname = 'server X' and source = 'service A' ) or
        ( hostname = 'server Y' and source = 'service B' )
    ) and
    localinstant between
        '1-3-2009 6:15' and '1-3-2009 6:45'
);

 

is expressed by this XML:

<filters>
    <filter>
        <eventtype>
            error
        </eventtype>
        <hostname>
            server X
        </hostname>
        <source>
            service A
        </source>
        <start>
            1-3-2009 6:15
        </start>
        <end>
            1-3-2009 6:45
        </end>
    </filter>
    <filter>
        <eventtype>
            error
        </eventtype>
        <hostname>
            server Y
        </hostname>
        <source>
            service B
        </source>
        <start>
            1-3-2009 6:15
        </start>
        <end>
            1-3-2009 6:45
        </end>
    </filter>
</filters>

Here's the magical stored procedure:

create procedure [dbo].[sp_SelectLogTimelineFromXMLFilter](
    @filterXml nText
)
as
-- procedure result
declare @result int;
set @result = 0;
-- XML doc handle
declare @hDoc int;
-- sproc error code
declare @err int;
-- parse XML document
exec @err = sp_xml_preparedocument @hDoc output, @filterXml;
if( 0 < @err )
begin
    raiserror(
        'Unable to prepare XML filter document',
        10,
        @err
    );
    return 1;    
end
select e.*, x.label, x.color
from logentries as e
inner join openxml(
    @hDoc,
    '/filters/filter',
    1
)
with(
    hostname varchar( 256 ) './hostname',
    useridentity varchar( 256 ) './useridentity',    
    eventtype varchar( 256 ) './eventtype',
    eventid int './eventid',
    source varchar( 256 ) './source',
    [message] varchar( 256 ) './message',
    startinstant datetime './start',
    endinstant datetime './end',
    label varchar( 256 ) './@label',
    color varchar( 32 ) './@color'
) as x on(
    ( x.hostname is null or
        e.hostname like x.hostname ) and
    ( x.useridentity is null or
        e.useridentity like x.useridentity ) and
    ( x.eventtype is null or
        e.eventtype like x.eventtype ) and
    ( x.eventid is null or
        e.eventid = x.eventid ) and
    ( x.source is null or
        e.source like x.source ) and
    ( x.category is null or
        e.category like x.category ) and
    ( x.[message] is null
        or e.[message] like x.[message] ) and
    ( x.startinstant is null or
        e.localinstant between
            x.startinstant and x.endinstant )
)
order by localinstant;
-- unload XML document
exec @err = sp_xml_removedocument @hDoc;
-- NOTE: bug in sp_xml_removedocument on SQL 2000
--    causes it to return 1 on success, not 0
--     fixed in SQL 2005
if( 1 < @err )
begin
    raiserror(
        'Unable to remove filter XML document',
        10,
        1
    );
    return 1;
end
return @result;

The procedure accepts an XML document in its @filterXml parameter.  The magic starts on line 31, where the logentries table is inner-joined against each <filter /> element in the XML document.  The with() clause starting on line 36 defines how the filter XML maps onto fields for the join operation using XPath expressions.  For instance, line 43:

 

    startdate datetime './start',

 

defines a field named startdate of type datetime mapped to the <start /> child element of the <filter /> element.

Line 47 starts the definition of the join criteria.  If the criteria value is null - or unspecified in the XML document - all results for that criteria are returned.  Otherwise, the value supplied in the filter XML is used to limit the data returned from the query.  In addition, for varchar() fields I use the LIKE operator for a bit of flexibility.

Using XML to define the filter criteria has a few immediate benefits to my spelunking efforts:

  • Building XML from HTTP request data is a breeze.  So enabling the timeline to display dynamic sets of log data took little more than adding form elements to the timeline web page.
  • A single story containing multiple time ranges and arbitrarily complex filter vectors can be captured in a single XML document.  Rather than having to make multiple queries to obtain all of the data necessary to define a timeline, the XML document is processed in a single query.
  • The XML document defining a timeline can be persisted to a file or database table.  This made event timelines easy to build, save, reload, and edit.

In short, this technique saved me from writing a bunch of one-off SQL and allowed me to focus on making sense of the data.