Automating File Conversion in Word with Powershell

§ September 12, 2016 13:26 by beefarino |

This post is for friend and fellow nerd Jeff Truman, who asked me today:

Why yes, I do happen to have an example of doing just that thing.  Here's the code I use to convert folders of docs and pdfs to raw txt files:

$word = new-object -com 'word.application';
$textformat = 2;

'pdf','doc' | foreach {
    $type = $_;
    ls ./$type | foreach { 
        $doc = $word.documents.open( $_.fullname ); 
        $doc.saveas( 
            ($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'), 
            $textformat 
        ); 
    
        $doc.close();  
        $doc = $null; 
    }
}

$word.quit();
$word = $null;

The script makes some assumptions:

  • PDFs and DOC files are in unique directories named PDF and DOC respectively;
  • the files are converted to TXT format and saved in a folder named TXT

In a nutshell, the script uses COM to automate the Word application.  It makes Word iteratively load the source documents and save them in a text format in the new location.  The location is derived from the original document path using this little convoluted bit of code:

($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),

This line takes the full path of the source file ($_.fullname), replaces the source directory with the TXT directory, and also replaces the original file extension with a .txt extension.

Other than that, it's your basic Word automation.

Enjoy!

 



Dropbox PowerShell Provider

§ July 16, 2016 00:47 by beefarino |

Life’s nuts, but I managed to create another PowerShell Provider I desperately need.  This one mounts your Dropbox account as a PowerShell drive.  Its functionality is limited, but it’s still wicked awesome.

The module is available in the gallery, and the source code is available on github.  The provider is built on top of P2F so the code is minimal.

The available feature set is really driven by my immediate needs.  In short, I found myself needing to filter large collections of files from across the corporate Dropbox hive and push their contents to Azure blob storage.  So at present, this first release supports basic navigation, as well as the get/set-content cmdlets.

Usage is pretty straightforward if you’ve used PowerShell modules and providers.  This example below mounts a Dropbox account and lists the contents in the root path:

import-module dropbox;
new-psdrive -name dp -psprovider dropbox -root '';
# powershell will open a window here to allow you 
# to authenticate with dropbox
cd dp:
dir;

Getting the content of files from Dropbox is a simple matter of using the get-content cmdlet.  The provider transfers all files as raw byte arrays, so you need to take special care when saving the files locally:

$bytes = get-content dp:/data/file.txt;
[io.file]::writeAllBytes( "c:\data\file.txt", $bytes);
get-item c:\data\file.txt;

Eventually I’d like this module to support full item operations on the Dropbox hive.  But, you know me, I’ll get to it when I actually need it.

Enjoy!



Disappearing Drives in WMF 5.0 Preview

§ April 21, 2015 17:04 by beefarino |

Just a quick note about something I uncovered in the WMF 5.0 preview.  This one’s been frustrating me for a few weeks as I’ve prepped demos for the PowerShell Summit.

If you create a new PowerShell drive in your session that has a single-letter name, the drive will be forcibly removed  unless it’s backed by the FileSystemProvider.

To see this in action, run the following script in PowerShell 5 preview:

new-psdrive z -psp filesystem -root 'c:\'
new-psdrive y -psp registry -root 'hkcu:\'
sleep -second 5
get-psdrive

 

You’ll notice that the FileSystemProvider z: drive sticks around, where the RegistryProvider y: drive disappears.

Now that I’ve figured out what’s happening, I feel better about pushing the new version of Simplex to the gallery!



A Dirt Simple PowerShell Hosting Example

§ October 15, 2014 13:40 by beefarino |

Earlier today I responded to a shout for help from twitter friend and fellow PowerShell hacker Tim Meers:

 

After sending Tim some code showing how to host simple PowerShell scripts in your application, I’ve received several requests from others for the same.  I figured it was worth a quick blog post.

The annotated code is below.  This host accepts an item path as input, and uses PowerShell to fetch and format the item as text.

using System;
using System.Linq;

// you'll need to add a reference to System.Management.Automation
using System.Management.Automation;
using System.Management.Automation.Runspaces;

namespace HostSample
{
    class Program
    {
        // to run this application, specify a path to an item
        //  e.g.:
        //      c:\
        //      env:/computername
        //      hkcu:/software
        static void Main(string[] args)
        {
            // create a RUNSPACE - this is used to maintain state between pipelines.
            //  e.g., variables, function definitions, etc
            using (var runspace = RunspaceFactory.CreateRunspace())
            {
                // open the runspace before you use it
                runspace.Open();

                // set the default runspace for the process;
                //  this is necessary for some features and cmdlets to work properly
                Runspace.DefaultRunspace = runspace;
                
                // create a POWERSHELL pipeline
                using (var powershell = PowerShell.Create())
                {
                    // assign the runspace to the pipeline
                    powershell.Runspace = runspace;

                    // build up the pipeline
                    powershell.AddCommand("get-item")
                        .AddParameter("path", args[0])
                        .AddCommand("format-table")
                        .AddCommand( "out-string");

                    // execute the pipeline
                    var results = powershell.Invoke();

                    // output the results
                    Console.WriteLine( results.FirstOrDefault() );
                }
            }
        }
    }
}

If you have any questions, please ask them in the comments.

Enjoy!