Automating File Conversion in Word with Powershell

§ September 12, 2016 13:26 by beefarino |

This post is for friend and fellow nerd Jeff Truman, who asked me today:

Why yes, I do happen to have an example of doing just that thing.  Here's the code I use to convert folders of docs and pdfs to raw txt files:

$word = new-object -com 'word.application';
$textformat = 2;

'pdf','doc' | foreach {
    $type = $_;
    ls ./$type | foreach { 
        $doc = $word.documents.open( $_.fullname ); 
        $doc.saveas( 
            ($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'), 
            $textformat 
        ); 
    
        $doc.close();  
        $doc = $null; 
    }
}

$word.quit();
$word = $null;

The script makes some assumptions:

  • PDFs and DOC files are in unique directories named PDF and DOC respectively;
  • the files are converted to TXT format and saved in a folder named TXT

In a nutshell, the script uses COM to automate the Word application.  It makes Word iteratively load the source documents and save them in a text format in the new location.  The location is derived from the original document path using this little convoluted bit of code:

($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),

This line takes the full path of the source file ($_.fullname), replaces the source directory with the TXT directory, and also replaces the original file extension with a .txt extension.

Other than that, it's your basic Word automation.

Enjoy!

 



Creating a PowerShell Provider pt 0: Gearing Up

§ May 26, 2009 01:11 by beefarino |

Last month I gave a talk at ALT.NET Charlotte on using PowerShell as a platform for support tools.  The majority of the talk revolved around the extensible provider layer available in PowerShell, and how this enables support engineers to accomplish many things by learning a few simple commands.  Creating providers isn't hard, but there is a bit of "black art" to it.  Based on the lack of adequate examples and documentation available, I thought I'd fill the void with a few posts derived from my talk.

In these posts, I will be creating a PowerShell provider around the ASP.NET Membership framework that enables the management of user accounts (for more information on ASP.NET Membership, start with this post from Scott Gu's blog) .  Basically, this provider will allow you to treat your ASP.NET Membership user repository like a file system, allowing you to perform heroic feats that are simply impossible in the canned ASP.NET Web Site Management Tool. 

Like what you ask?   Consider this:

dir users: | 
   where { !( $_.isApproved ) -and $_.creationDate.Date -eq ( ( get-date ).Date ) } 

which lists all unapproved users created today.  Or this:

dir users: | where{ $_.isLockedOut } | foreach{ $_.unlockUser() } 

which unlocks all locked users.  Or this:

dir users: | 
   where { ( (get-date) - $_.lastLoginDate ).TotalDays -gt 365 } |
   remove-item 

which removes users who haven't authenticated in the past year.  These are things you simply can't do using the clumsy built-in tools.  With a simple PowerShell provider around the user store, you can accommodate these situations and many others you don't even know you need yet.

This first post is aimed at getting a project building, with the appropriate references in place, and with the debugger invoking powershell properly.  The next few posts will focus on implementing the proper extensibility points to have PowerShell interact with your provider.  Future posts will focus on implementing specific features of the provider, such as listing, adding, or removing users.  Eventually I will discuss packaging the provider for distribution and using the provider in freely available tools, such as PowerGUI.

Prerequisites

You'll need the latest PowerShell v2 CTP; at the time of this writing, the latest was CTP3 and it was available here.  Since we're working with a CTP, you should expect some things to change before the RTM; however, the PowerShell provider layer is unchanged from v1.0 in most respects.  The only major difference is with regards to deployment and installation, which I'll discuss at the appropriate time.

If you are currently using PowerShell v1.0, you will need to uninstall it.  Instructions are available here and here.

The .NET Framework 3.5, although not required to run PowerShell v2 CTP3, is required for some of the tools that accompany the v2 release.

Finally, I suggest you fire up PowerShell on its own at least once, to verify it installed correctly and to loosen up the script execution policy.  To save some grief, one of the first commands I execute in a new PowerShell install is this one:

set-executionPolicy RemoteSigned

This will enable the execution of locally-created script files, but require downloaded script files be signed by a trusted authority before allowing their execution.

Setting Up the Project

Create a new Class Library project in Visual Studio named "ASPNETMembership".  This project will eventually contain the PowerShell provider code.

You will need to add the following references to the project:

  • System.Web - this assembly contains the MembershipUser type we will be exposing in our PowerShell provider;
  • System.Configuration - we'll need to access the configuration subsystem to properly set up our membership provider;
  • System.Management.Automation - this contains the PowerShell type definitions we need to create our provider.  You may have to hunt for this assembly; try here: C:\Program Files\Reference Assemblies\Microsoft\WindowsPowerShell\v1.0.

Now that the necessary references are configured, it's time to configure debugging appropriately. 

Open the project properties dialog and select the Debug tab.  Under Start Action, select the "Start external program" option.  In the "Start external program" textbox, enter the fully qualified path to the PowerShell executable: c:\windows\system32\windowspowershell\v1.0\powershell.exe.

Yes, even though you're running PowerShell v2, it still lives in the v1.0 directory.

In the "Command line arguments textbox, enter the following:

-noexit -command "[reflection.assembly]::loadFrom( 'ASPNETMembership.dll' ) | import-module"

The -command argument instructs PowerShell to execute the supplied command pipe string as if it were typed in at the console.  It may not be obvious what the command pipe string is doing.  In a nutshell, the first command in the pipe loads the ASPNETMembership project output dll into the AppDomain.  The second command in the pipe causes PowerShell to load any cmdlets or providers implemented in the dll.  I'll touch on the import-module command more in a future post.

The -noexit argument prevents PowerShell from exiting once the command has been run, which enables us to type in commands and interact with the console while debugging.

Test the Project Setup

Build and Run the empty project.  A PowerShell console should launch.

In the console, run the following command, which lists all assemblies loaded in the currently executing AppDomain whose FullName property matches 'aspnet':

[appdomain]::currentDomain.GetAssemblies() | 
   where { $_.FullName -match 'aspnet' } | 
   select fullname

Verify that the ASPNETMembership assembly is listed in the output of this command:

 

If it isn't, double-check the command-line arguments specified in the project properties -> debug tab.  Specifically, ensure that:

  • the command arguments are entered exactly as specified above, and
  • the working directory is set to the default (empty) value

That's it for now - next post we will begin implementing the MembershipUsers PowerShell provider and enable the most basic pieces necessary to list the users in the membership store.



Automation Framework pt 6: PowerShell Integration for Free

§ March 11, 2009 03:47 by beefarino |

Now that I have a fluent interface hiding a lot of the complexities of my automation framework, I wanted to focus on getting the framework integrated with PowerShell.  My desire is to leverage all the features of PowerShell, including the command pipeline and existing functions.  After folding a few methods into PowerShell, I recognized the general pattern; I came up with a way to package the framework in a PowerShell Module that automagically generates wrapper functions around the fluent interfaces.  So moving forward, as the framework expands I don't need to do anything to get deep PowerShell integration.

I stared out using the fluent interfaces directly:

$context = new-object pokerroomshell.commands.framework.context;
$executor = [PokerRoomShell.Commands.Fluency.Execute]::Using( $context );
$properties = @{ 
    address = "12 Main St";
    city = "Anywheretownvilleton";
    state = 'Texahomasippi';
    zip = '75023';
};
[PokerRoomShell.Commands.Execute]::using( $context )
    .for( $player )
    .setProperties( $properties )
    .deposit( 500 ); 

which works, but becomes very cumbersome when I want to process multiple players or existing PowerShell commands:

$players | %{
    [PokerRoomShell.Commands.Execute]::using( $context )
        .for( $_ )
        .setProperties( $properties )
        .deposit( 500 );
    $_; # put player back in pipeline
} | export-clixml players.xml; 

What I really want is to make the framework look more like it was designed for PowerShell.  Or perhaps a better way to say it: I want to use PowerShell to drive my system, but I don't want to do a lot of work to get there.  I started tinkering, implementing a few of the methods from the AccountCommands fluent interface to see what it would take to use the methods in a pipeline.  In order to do something like this:

$players | set-properties $properties | 
    new-deposit 500 | 
    export-clixml players.xml; 

I need these functions:

function new-deposit
{
    [CmdletBinding()]
    param(
        [Parameter(Position=0,ValueFromPipeline=$true,Mandatory=$true)]
        [PokerRoomShell.Commands.Framework.Account]
        $account,
        [Parameter(Position=1,Mandatory=$true)]
        [int]
        $amount
    )
    process
    {        
        $script:accountCommands = $executor.forPlayer( $account ).deposit( $amount );
        $script:accountCommands.User;        
    }
}
function set-properties
{
    [CmdletBinding()]
    param(
        [Parameter(Position=0,ValueFromPipeline=$true,Mandatory=$true)]
        [PokerRoomShell.Commands.Framework.Account]
        $account,
        [Parameter(Position=1,Mandatory=$true)]
        [Hashtable]
        $properties
    )
    process
    {        
        $script:accountCommands = $executor.for( $account ).setProperties( $properties );
        $script:accountCommands.Account;        
    }
} 

Once I had a few of these functions under my belt, the pattern became evident.  Each method gets its own PowerShell wrapper function.  Each PowerShell wrapper function can be reduced to a matter of:

  • accepting an Account reference from the pipeline;
  • accepting any parameters needed by the AccountCommands method;
  • creating an AccountCommands instance around the Account reference;
  • calling the method on the AccountCommands instance;
  • returning the Account object back to the pipeline
It was obvious that these wrappers would consist of mostly boilerplate, and that they could simply be generated if I had a little extra metadata available on the fluent command objects.  I defined three simple attributes to this end:
  • the CommandPipelineAttribute identifies objects as candidates for PowerShell integration;
  • the PipelineInputAttribute marks the property of the object that will be used as pipeline input and output;
  • the CommandBindingAttribute defines the verb-noun name of the PowerShell wrapper function.

The attributes are markers I can place in my fluent command objects to indicate how the object methods should be wrapped in PowerShell:

[CommandPipeline]
public class AccountCommands
{        
    // ...
    [PipelineInput]
    public Account Account
    {
        get;
        set;
    }
    // commands
    [CommandBinding( Verb.Find, Noun.Player )]
    public AccountCommands Lookup()
    {
        // ...
    }
    [CommandBinding( Verb.New, Noun.Player )]
    public AccountCommands Create()
    {
        // ...
    }
    [CommandBinding( Verb.New, Noun.Deposit )]
    public AccountCommands Deposit( decimal amount )
    {
        // ...
    }
    [CommandBinding( Verb.Set, Noun.Properties )]
    public AccountCommands SetProperties( Hashtable properties )
    {
        // ...
    }
    // ...
} 

With these markers, generating PowerShell wrappers is a simple matter of snooping out this metadata and filling in the blanks of function template.  After a few minutes of hacking I had a working function to accomplish the task:

function generate-function
{
    [CmdletBinding()]
    param(
         [Parameter(Position=0)]
         [system.reflection.assembly] $assembly
    )
    process
    {
        # find all types marked with the CommandPipeline attribute
        foreach( $type in get-markedTypes( $assembly ) )
        {
            # find all methods marked with the CommandBinding attribute
            foreach( $method in ( get-markedMethods $type ) )
            {
                # create a script block wrapping the method
                $method.ScriptBlock = create-wrapperScriptBlock $method;
                return $method;
            }                     
        }
    }
}

In a nutshell, generate-function finds all public types marked with the CommandPipelineAttribute, then creates wrapper ScriptBlocks around the methods on those types marked with the CommandBindingAttribute (the details are described below).  I can use this to create the PowerShell wrapper functions dynamically, using the new-item cmdlet against the built-in PowerShell Function provider:

foreach( $script:m in generate-function $assemblyName )
{
    # only create functions that don't exist yet
    # this will allow for command proxies if necessary 
    if( !( test-path $_.path ) )
    { 
       ni -path $script:m.path -value ( iex $script:m.ScriptBlock ) -name $script:m.Name;
    }
}

Now when my automation framework expands, I need to do zero work to update the PowerShell layer get the deep PowerShell integration I want.  Kick ass!

Example Generated Function

Here is a PowerShell session that demonstrates the function generation, and shows what the resulting function looks like:

PS >gi function:set-pin
Get-Item : Cannot find path 'Function:\set-pin' because it does not exist.
At line:1 char:3
+ gi <<<<  function:set-pin
    + CategoryInfo          : ObjectNotFound: (Function:\sset-pin:String) [Get-Item], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetItemCommand
    
PS >generate-function "pokerroomshell.commands" | 
    ? { !( test-path $_.path ) } | 
    % { ni -path $_.Path -value ( iex $_.ScriptBlock ) -name $_.Name }
    
PS >(gi function:set-pin).definition
    [CmdletBinding()]
    param( 
        [Parameter(Mandatory=$true,ValueFromPipeline=$true)]
        [PokerRoomShell.Commands.Framework.Account]
        $user,
        [Parameter(Position=0,Mandatory=$true)]
        [System.String]
        $newPin
    )
process {
        $script:ctx = $executor.for( $user ).ResetPin( $newPin );
        $user;    
}

Gory Details

You really want to see the code?  You asked for it....

Finding types marked with the CommandPipelineAttribute is simple:

# find all types marked with the CommandPipeline attribute
function get-markedTypes( $asm )
{
    $asm.getExportedTypes() |
        ? { $_.getCustomAttributes( [pokerRoomShell.commands.framework.commandPipelineAttribute], $true ) };
}

Finding the methods on those types marked with the CommandBindingAttribute is just as easy; however, to simplify the ScriptBlock template processing, I preprocess each method and build up a little data structure with my necessities:

# find all methods marked with the CommandBinding attribute
function get-markedMethods( $type )
{
    # find the property to use as pipeline input / command output
    $pipelineInput =  $type.GetProperties() | ? { 
        $_.getCustomAttributes( [pokerRoomShell.commands.framework.pipelineInputAttribute], $true )
    } | coalesce;
    # find methods marked with the CommandBinding attribute
    $type.GetMethods() | % { 
        $attr = $_.getCustomAttributes( [pokerRoomShell.commands.framework.commandBindingAttribute], $true ) | coalesce;
    # build a hash table of method data for the scriptblock template
        if( $attr )
        {                      
            # return a hash table of data needed to define the wrapper function
            @{
                Method = $_;
                Binding = $attr;
                Input = @{ 
                    Name = $pipelineInput.Name;
                    Type = $pipelineInput.propertyType;
                };
                Parameters = $_.GetParameters();
                Name = "" + $attr.verb + "-" + $attr.noun;
                Path = "function:" + $attr.verb + "-" + $attr.noun;                
            };
        }
    }
}

And then comes the real nut: the function that creates the scriptblock; this looks a bit ugly - lots of escaped $'s and evaluation $()'s and here-strings, but it works:

# create a script block wrapping the method 
function create-wrapperScriptBlock( $method )
{   
    $parameterPosition = 0
    
    # collection of parameter declarations
    $params = new-object system.collections.arraylist;
   
   # define the pipeline command input parameter
    $params.add(
@"
        [Parameter(Mandatory=`$true,ValueFromPipeline=`$true)]
        [$($method.input.type.fullName)]
        `$$($method.input.name)
"@
    ) | out-null; # eat the output of add()
   
    #add any parameters required by the method being wrapped
    $params.addRange(
        @( $method.parameters | 
            %{         
@"
        [Parameter(Position=$parameterPosition,Mandatory=`$true)]
        [$($_.ParameterType.fullName)]
        `$$($_.Name)
"@;
            ++$parameterPosition;
            } 
        ) 
    );
   
    # join the $params collection to a single string   
    $params = $params -join ",`n";        
    # define the method call arguments    
    $callArgs = ( $method.parameters | %{      
        "`$$($_.Name)";
        } 
    ) -join ", ";   
# return the wrapper script block as a string
@"
{
    [CmdletBinding()]
    param( 
        $($params.Trim())
    )
    
    process
    {        
        `$script:ctx = `$executor.for( `$$($method.input.name) ).$($method.Method.Name)( $($callArgs.trim()) );
        `$$($method.Input.Name);        
    }
}
"@;
}

There's quite a bit going on in there, but none of it is rocket science.  First, a list of function parameters is built, with the pipeline input parameter loaded first followed by any arguments required by the automation framework method.  This list is joined into a flat string and surrounded by a param() declaration.  A second list of parameters - those that will be passed to the automation framework method - is built and flattened, then wrapped in a call to the actual framework method.

The resulting scriptblock makes a few assumptions, most notably the existence of a global (or module-scoped) $executor variable that is declared like so:

$context = new-object pokerroomshell.commands.framework.context;
$executor = [PokerRoomShell.Commands.Fluency.Execute]::Using( $context );

But those little static details can be wrapped up in a module. 



Automation Framework pt 5: Hiding Complexty in a Fluent Interface

§ March 2, 2009 11:12 by beefarino |

In my last Automation Framework post I asked for some advice on managing state between commands.  I didn't get any public feedback on the blog, but an old college of mine named Jon Lester tinkered with the blogged examples and sent me some of his code.  The nut of his idea was to use extension methods on an accumulator object to separate the domain logic from the testing apparatus.  I liked his approach, it wasn't something that would have popped into my head; it also put a spotlight on a simple and elegant solution to my state sharing problem.

Before I dive into Jon's idea, take a look at how I'm currently sharing state between my command objects:

//...
UserAccount account = new UserAccount();
CompositeCommand cmd = new CompositeCommand(
    new LoadUserAccountCommand { UserName = userName, Account = account },
    new MakeDepositWithTicketCommand { Amount = depositAmount, Account = account }
);
bool result = cmd.Execute( context );
// ... 

This example represents a single task on the system under test - specifically making a deposit into a user account.  The task effort is distributed across the LoadUserAccountCommand and MakeDepositWithTicketCommand objects, which must share a common Account object in order to accomplish the ultimate goal.

As I described previously, I like this approach okay, especially compared to some of the alternatives I've tried.  I think it's simple enough to understand, but at the very least, it requires some explanation which is an API FAIL.  And although you can make it work for value and immutable types, it takes an ugly hack.

My college's solution was to isolate the shared states from the commands, and expose the units of work in a fluent interface wrapping the shared state.  I whittled down his approach into a lean and clean interface - here's an example of the end result:

// ...
Execute.Using( context )
    .ForPlayer( account )
    .Deposit( 500m );
// ... 

Same unit of work, but a lot less code and far easier to understand IMO.  

Implementation

The root of the fluency is implemented in the Execute class, which encapsulates a command context (which I describe here):

public class Execute 
{
    IContext context;
    private Execute( IContext ctx )
    {
        context = ctx;
    }
    
    public static Execute Using( IContext context )
    {
        Execute e = new Execute( context );
        return e;
    }
    public AccountCommands ForPlayer( Account account )
    {
        return new AccountCommands( user, Process );
    }
    
    public GameCommands ForGame( Game game )
    {
        return new GameCommands( game, Process );
    }
    
    bool Process( ICommand[] commands )
    {
        foreach( var command in commands )
        {
            if( ! command.Execute( context ) )
            {
                return false;
            }
        }
        return true;
    }
}

As you can see, the only inroad to this class is the Using method, which returns a new instance of the execute class initialized with the Command Context.  The various flavors of the For() method are used to capture the shared state for a set of commands.  They each return a object supporting a redundant, fluent interface of commands around the state.  For example, here is some of the AccountCommands class:

public class AccountCommands
{
    Func< bool, ICommand[] > processCommands;
    
    public AccountCommands( Account account, Func< bool, ICommand[] > callback )
    {
        processCommands = callback;
        this.Account = account;
        
        processCommands(
            Chain(
                new LoadUserAccountCommand { Account = Account },
                new CreateUserAccountCommand { Account = Account }
            )
        );
    }
    
    public Account Account { get; set; }
    
    public AccountCommands Deposit( decimal amount )
    {
        processCommands(
            new MakeDepositWithTicketCommand {
                Amount = amount,
                Account = account
            }
        );
        
        return this; 
    }
    
    public AccountCommands Withdraw( decimal amount )
    {
        processCommands(
            new MakeWithdrawalWithTicketCommand {
                Amount = amount,
                Account = account
            }
        );
        
        return this;
    }
    
    public AccountCommands SetProperties( Hashtable properties )
    {
        processCommands(
            Compose(
                new LoadAccountPropertiesCommand {
                    Account = account
                },
                new SetAccountPropertiesCommand {
                    Properties = properties,
                    Account = account
                }
            )
        );
        
        return this;
    }    
    
    // etc ...
    ICommand Chain( params ICommand[] commands)
    {
        return new ChainOfResponsibilityCommand(
            commands
        );
    }
    ICommand Compose( params ICommand[] commands )
    {
        return new CompositeCommand(
            commands
        );
    }
}


Items of note:

  • The constructor accepts two arguments: an Account object that represents the state shared by every member of the class, and a Func<> delegate that accepts an array of command objects and returns a boolean;
  • Each public method of the class represents a single task one can perform against an account;
  • Every public method simply composes one or more Command objects, which are passed to the processCommand callback for actual processing.

Huge Bennies

It still needs some work, but there are many things I like about this approach.  The thing I like most is that the fluent interface hides the complexities of composing command objects with shared state to perform system tasks.  I get all the benefits of the command pattern with minimal hassle.

With a bit of refactoring, I can easily reuse the *Commands objects from this fluent interface to do things besides execute the task.  E.g., perhaps I want to build up a system scenario and persist it, something like this:

var commands = new List< ICommand >();
BuildUp.Into( commands )
    .ForEachAccount
    .Deposit( 500m )
    .SetImage( PlayerImages.Face, testBitmap )
    .SetProperties(
        new {
            Address = "12 Main St.";
            City = "Anywheretownvilleton"
            State = "Texhomasippi"
            Zip = "75023"
        }
    );
// now the commands variable contains the defined 
// command structure and can be re-used against new
// players in the future, persisted to disk, etc.

Another big benefit of this approach is that it gives me a stable binding point for PowerShell - and by that I mean that I can deeply integrate this automation framework with PowerShell, leveraging all of the built-in freebies with virtually no effort.  But that is another post...