ConvertFrom-PDF PowerShell Cmdlet

§ July 8, 2009 06:14 by beefarino |

I hate PDFs. 

And now I need to search through several hundred of them, ranging from 30 to 300 pages in length, for cross-references and personnel names which ... um ... well, let's just say they no longer apply.  Sure reader has the search feature built-in, so does explorer, but that's so 1980's.  And I sure don't want to do each one manually...

I poked around the 'net for a few minutes to find a way to read PDFs in powershell, but no donut.  So I rolled my own cmdlet around the iTextSharp library and Zollor's PDF to Text converter project.

There isn't much to the cmdlet code, given that all of the hard work of extracting the PDF text is done in the PDFParser class of the converter project:

using System;
using System.IO;
using System.Management.Automation;
namespace PowerShell.PDF
{
    [Cmdlet( VerbsData.ConvertFrom, "PDF" )]
    public class ConvertFromPDF : Cmdlet
    {
        [Parameter( ValueFromPipeline = true, Mandatory = true )]
        public string PDFFile { get; set; }
        
        protected override void ProcessRecord()
        {
            var parser = new PDFParser();
            using( Stream s = new MemoryStream() )
            {
                if( ! parser.ExtractText(File.OpenRead(PDFFile), s) )
                {
                    WriteError( 
                        new ErrorRecord(
                            new ApplicationException(),
                            "failed to extract text from pdf",
                            ErrorCategory.ReadError,
                            PDFFile
                        )    
                    );
                    return;
                }
                s.Position = 0;
                using( StreamReader reader = new StreamReader( s ) )
                {
                    WriteObject( reader.ReadToEnd() );
                }
            }
        }
    }
}

The code accepts a file path as input; it runs the conversion on the PDF data and writes the text content of the file to the pipeline.  Not pretty, but done.

Usage

Here is the simple case of transforming a single file:

> convertfrom-pdf -pdf my.pdf

or

> my.pdf | convertfrom-pdf 

More complex processing can be accomplished using PowerShell's built-in features; e.g., to convert an entire directory of PDFs to text files:

> dir *.pdf | %{ $_ | convertfrom-pdf | out-file "$_.txt" } 

More relevant to my current situation would be something along these lines:

> dir *.pdf | ?{ ( $_ | convertfrom-pdf ) -match "ex-employee name" } 

Download the source: PowerShell.PDF.zip (1.10 mb) 

Enjoy!




MSDN Southern Fried Road Show Charlotte

§ June 2, 2009 17:14 by beefarino |

Many, many thanks to Brian Hitney, Chad Brooks, and Glen Gordon for letting me give a shotgun PowerShell presentation at the MSDN RoadShow in Charlotte today.  I appreciate the opportunity, guys!

As promised, here is a quick summary and some links to PowerShell tutorials and developer blogs...

The Big Three Commands

Remember, you don't need to remember all 200+ commands available in PowerShell, because PowerShell provides built-in cheat sheets!

Get-Command

Example 1: list all available commands

Get-command
Example 2: list available commands pertaining to processes
Get-command *process*

 

Get-help

Example 1: general powershell help

Get-Help
Example 2: list help topics matching a term
Get-help *object*
Example 3: get “short” help for a specific command
Get-help get-process
OR
get-process -?
Example 3: get “full” help (details, examples, etc) for a specific command
Get-help get-process -full

 

Get-Member

Example 1: list members of a local variable

$variable | get-member
OR
Get-member –input $variable
Example 2: list members of a pipeline result
Get-process | get-member

Links

Download Powershell v2 CTP3

Community Resources

PowerShellCommunity.org - aggregator of scripts, blogs, and resources.  A great place to start searching for powershell packages.
PowerScripting podcast – weekly podcast of all things powershell.  Generally targeted at sys admins and DBAs, but often they discuss targeted technologies in powershell, such as sharepoint, sql server, exchange, VMWare, etc.
Nivot InkOisin Grehan's blog containing an insane amount of detail concerning v2 CTP features, including creating modules, eventing, and remoting.
huddledmasses.org - Joel Bennet’s blog, author of the PowerBoots WPF extensions for powershell, and massive powershell development online presence.
PowerShell Team Blog – the horse’s mouth, so to speak.
PowerGUI - an awesome MMC-style GUI PowerShell host, very flexible and endlessly useful.

Open-Source Projects

PowerShell Community Extensions – targeted at powershell v1, the functionality offered here is largely absorbed in v2; however, this codebase is still a great resource for learning more about how powershell works.
SQL Server PowerShell Extensions – powershell support for managing SQL 2000 and 2005; note that SQL 2008 has powershell integration out of the box.
PolyMon – an open-source system monitoring solution with powershell support.

Module, Training, & Tool Vendors

Quest AD – active directory support for powershell. PowerGUI.org – Extensible and generic MMC-type UI front-end for powershell.
SAPIEN – makers of PrimalScript scripting IDE, designed specifically for sys admins; training courses in powershell, scripting, etc.
Quest – authors of free PowerGUI (http://www.powergui.org), a UI front-end for powershell, as well as the ActiveRoles management shell for AD and many, many other management products.
SoftwareFX – vendor of .NET and Java instrumentation controls, as well as PowerGadgets, a tool that enables you to create vista side-bar gadgets based on PowerShell scripts - it's BIG FUN!



Creating a PowerShell Provider pt 0: Gearing Up

§ May 26, 2009 01:11 by beefarino |

Last month I gave a talk at ALT.NET Charlotte on using PowerShell as a platform for support tools.  The majority of the talk revolved around the extensible provider layer available in PowerShell, and how this enables support engineers to accomplish many things by learning a few simple commands.  Creating providers isn't hard, but there is a bit of "black art" to it.  Based on the lack of adequate examples and documentation available, I thought I'd fill the void with a few posts derived from my talk.

In these posts, I will be creating a PowerShell provider around the ASP.NET Membership framework that enables the management of user accounts (for more information on ASP.NET Membership, start with this post from Scott Gu's blog) .  Basically, this provider will allow you to treat your ASP.NET Membership user repository like a file system, allowing you to perform heroic feats that are simply impossible in the canned ASP.NET Web Site Management Tool. 

Like what you ask?   Consider this:

dir users: | 
   where { !( $_.isApproved ) -and $_.creationDate.Date -eq ( ( get-date ).Date ) } 

which lists all unapproved users created today.  Or this:

dir users: | where{ $_.isLockedOut } | foreach{ $_.unlockUser() } 

which unlocks all locked users.  Or this:

dir users: | 
   where { ( (get-date) - $_.lastLoginDate ).TotalDays -gt 365 } |
   remove-item 

which removes users who haven't authenticated in the past year.  These are things you simply can't do using the clumsy built-in tools.  With a simple PowerShell provider around the user store, you can accommodate these situations and many others you don't even know you need yet.

This first post is aimed at getting a project building, with the appropriate references in place, and with the debugger invoking powershell properly.  The next few posts will focus on implementing the proper extensibility points to have PowerShell interact with your provider.  Future posts will focus on implementing specific features of the provider, such as listing, adding, or removing users.  Eventually I will discuss packaging the provider for distribution and using the provider in freely available tools, such as PowerGUI.

Prerequisites

You'll need the latest PowerShell v2 CTP; at the time of this writing, the latest was CTP3 and it was available here.  Since we're working with a CTP, you should expect some things to change before the RTM; however, the PowerShell provider layer is unchanged from v1.0 in most respects.  The only major difference is with regards to deployment and installation, which I'll discuss at the appropriate time.

If you are currently using PowerShell v1.0, you will need to uninstall it.  Instructions are available here and here.

The .NET Framework 3.5, although not required to run PowerShell v2 CTP3, is required for some of the tools that accompany the v2 release.

Finally, I suggest you fire up PowerShell on its own at least once, to verify it installed correctly and to loosen up the script execution policy.  To save some grief, one of the first commands I execute in a new PowerShell install is this one:

set-executionPolicy RemoteSigned

This will enable the execution of locally-created script files, but require downloaded script files be signed by a trusted authority before allowing their execution.

Setting Up the Project

Create a new Class Library project in Visual Studio named "ASPNETMembership".  This project will eventually contain the PowerShell provider code.

You will need to add the following references to the project:

  • System.Web - this assembly contains the MembershipUser type we will be exposing in our PowerShell provider;
  • System.Configuration - we'll need to access the configuration subsystem to properly set up our membership provider;
  • System.Management.Automation - this contains the PowerShell type definitions we need to create our provider.  You may have to hunt for this assembly; try here: C:\Program Files\Reference Assemblies\Microsoft\WindowsPowerShell\v1.0.

Now that the necessary references are configured, it's time to configure debugging appropriately. 

Open the project properties dialog and select the Debug tab.  Under Start Action, select the "Start external program" option.  In the "Start external program" textbox, enter the fully qualified path to the PowerShell executable: c:\windows\system32\windowspowershell\v1.0\powershell.exe.

Yes, even though you're running PowerShell v2, it still lives in the v1.0 directory.

In the "Command line arguments textbox, enter the following:

-noexit -command "[reflection.assembly]::loadFrom( 'ASPNETMembership.dll' ) | import-module"

The -command argument instructs PowerShell to execute the supplied command pipe string as if it were typed in at the console.  It may not be obvious what the command pipe string is doing.  In a nutshell, the first command in the pipe loads the ASPNETMembership project output dll into the AppDomain.  The second command in the pipe causes PowerShell to load any cmdlets or providers implemented in the dll.  I'll touch on the import-module command more in a future post.

The -noexit argument prevents PowerShell from exiting once the command has been run, which enables us to type in commands and interact with the console while debugging.

Test the Project Setup

Build and Run the empty project.  A PowerShell console should launch.

In the console, run the following command, which lists all assemblies loaded in the currently executing AppDomain whose FullName property matches 'aspnet':

[appdomain]::currentDomain.GetAssemblies() | 
   where { $_.FullName -match 'aspnet' } | 
   select fullname

Verify that the ASPNETMembership assembly is listed in the output of this command:

 

If it isn't, double-check the command-line arguments specified in the project properties -> debug tab.  Specifically, ensure that:

  • the command arguments are entered exactly as specified above, and
  • the working directory is set to the default (empty) value

That's it for now - next post we will begin implementing the MembershipUsers PowerShell provider and enable the most basic pieces necessary to list the users in the membership store.



Upcoming PowerShell Presentation

§ April 3, 2009 01:03 by beefarino |

Last night's ALT.NET Charlotte meeting was a success.  A tight turnout of big brains.  Many thanks to Brady Gaster, Brian Hitney, and Matt LeFevre for getting some air under the wings.

After sifting through some topics ideas, it looks like I'll be giving one of the first sessions at next month's meeting.  The topic will be Pragmatic PowerShell.  I'm still considering a few approaches to the talk, and I'm not sure how much I'll want to cover, but I'm thinking it would be best to create something interactive and have the audience participate.

If anyone has any suggestions or resources you think I should know about, please comment on this post!