This post is for friend and fellow nerd Jeff Truman, who asked me today:

Why yes, I do happen to have an example of doing just that thing.  Here's the code I use to convert folders of docs and pdfs to raw txt files:

$word = new-object -com 'word.application';
$textformat = 2;

'pdf','doc' | foreach {
    $type = $_;
    ls ./$type | foreach { 
        $doc = $ $_.fullname ); 
            ($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'), 
        $doc = $null; 

$word = $null;

The script makes some assumptions:

  • PDFs and DOC files are in unique directories named PDF and DOC respectively;
  • the files are converted to TXT format and saved in a folder named TXT

In a nutshell, the script uses COM to automate the Word application.  It makes Word iteratively load the source documents and save them in a text format in the new location.  The location is derived from the original document path using this little convoluted bit of code:

($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),

This line takes the full path of the source file ($_.fullname), replaces the source directory with the TXT directory, and also replaces the original file extension with a .txt extension.

Other than that, it's your basic Word automation.