Automating File Conversion in Word with Powershell

§ September 12, 2016 13:26 by beefarino |

This post is for friend and fellow nerd Jeff Truman, who asked me today:

Why yes, I do happen to have an example of doing just that thing.  Here's the code I use to convert folders of docs and pdfs to raw txt files:

$word = new-object -com 'word.application';
$textformat = 2;

'pdf','doc' | foreach {
    $type = $_;
    ls ./$type | foreach { 
        $doc = $word.documents.open( $_.fullname ); 
        $doc.saveas( 
            ($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'), 
            $textformat 
        ); 
    
        $doc.close();  
        $doc = $null; 
    }
}

$word.quit();
$word = $null;

The script makes some assumptions:

  • PDFs and DOC files are in unique directories named PDF and DOC respectively;
  • the files are converted to TXT format and saved in a folder named TXT

In a nutshell, the script uses COM to automate the Word application.  It makes Word iteratively load the source documents and save them in a text format in the new location.  The location is derived from the original document path using this little convoluted bit of code:

($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),

This line takes the full path of the source file ($_.fullname), replaces the source directory with the TXT directory, and also replaces the original file extension with a .txt extension.

Other than that, it's your basic Word automation.

Enjoy!

 



Dropbox PowerShell Provider

§ July 16, 2016 00:47 by beefarino |

Life’s nuts, but I managed to create another PowerShell Provider I desperately need.  This one mounts your Dropbox account as a PowerShell drive.  Its functionality is limited, but it’s still wicked awesome.

The module is available in the gallery, and the source code is available on github.  The provider is built on top of P2F so the code is minimal.

The available feature set is really driven by my immediate needs.  In short, I found myself needing to filter large collections of files from across the corporate Dropbox hive and push their contents to Azure blob storage.  So at present, this first release supports basic navigation, as well as the get/set-content cmdlets.

Usage is pretty straightforward if you’ve used PowerShell modules and providers.  This example below mounts a Dropbox account and lists the contents in the root path:

import-module dropbox;
new-psdrive -name dp -psprovider dropbox -root '';
# powershell will open a window here to allow you 
# to authenticate with dropbox
cd dp:
dir;

Getting the content of files from Dropbox is a simple matter of using the get-content cmdlet.  The provider transfers all files as raw byte arrays, so you need to take special care when saving the files locally:

$bytes = get-content dp:/data/file.txt;
[io.file]::writeAllBytes( "c:\data\file.txt", $bytes);
get-item c:\data\file.txt;

Eventually I’d like this module to support full item operations on the Dropbox hive.  But, you know me, I’ll get to it when I actually need it.

Enjoy!



Disappearing Drives in WMF 5.0 Preview

§ April 21, 2015 17:04 by beefarino |

Just a quick note about something I uncovered in the WMF 5.0 preview.  This one’s been frustrating me for a few weeks as I’ve prepped demos for the PowerShell Summit.

If you create a new PowerShell drive in your session that has a single-letter name, the drive will be forcibly removed  unless it’s backed by the FileSystemProvider.

To see this in action, run the following script in PowerShell 5 preview:

new-psdrive z -psp filesystem -root 'c:\'
new-psdrive y -psp registry -root 'hkcu:\'
sleep -second 5
get-psdrive

 

You’ll notice that the FileSystemProvider z: drive sticks around, where the RegistryProvider y: drive disappears.

Now that I’ve figured out what’s happening, I feel better about pushing the new version of Simplex to the gallery!



Come Speak at PowerShell Summit 2015!

§ September 2, 2014 16:58 by beefarino |

summit2015As you may know, my home town of Charlotte NC will play host to the PowerShell Summit 2015 NA next April.  This event will supplant the Charlotte PowerShell Users Group regularly scheduled PowerShell Saturday event, and it is a rare opportunity that you will not want to miss! 

The summit is currently accepting submissions for talks, and I’d like to suggest that you do so.  You do not need to be an expert, you do not need to be an experienced speaker - you simply need a shell story to tell, and I know you have one you’re dying to share.

The benefits of speaking are many, including free attendance to the event.  You'll get to rub elbows with members of the PowerShell team and interact with industry experts and community leaders from across the globe.

So please, consider sharing your story by submitting a talk.  The deadline for submissions (15 Sept) is approaching rapidly, and I would hate for you to miss out on this singular opportunity.