Automating File Conversion in Word with Powershell

§ September 12, 2016 13:26 by beefarino |

This post is for friend and fellow nerd Jeff Truman, who asked me today:

Why yes, I do happen to have an example of doing just that thing.  Here's the code I use to convert folders of docs and pdfs to raw txt files:

$word = new-object -com 'word.application';
$textformat = 2;

'pdf','doc' | foreach {
    $type = $_;
    ls ./$type | foreach { 
        $doc = $word.documents.open( $_.fullname ); 
        $doc.saveas( 
            ($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'), 
            $textformat 
        ); 
    
        $doc.close();  
        $doc = $null; 
    }
}

$word.quit();
$word = $null;

The script makes some assumptions:

  • PDFs and DOC files are in unique directories named PDF and DOC respectively;
  • the files are converted to TXT format and saved in a folder named TXT

In a nutshell, the script uses COM to automate the Word application.  It makes Word iteratively load the source documents and save them in a text format in the new location.  The location is derived from the original document path using this little convoluted bit of code:

($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),

This line takes the full path of the source file ($_.fullname), replaces the source directory with the TXT directory, and also replaces the original file extension with a .txt extension.

Other than that, it's your basic Word automation.

Enjoy!

 



Formatting Individual PowerShell Outputs as Errors

§ July 2, 2013 14:13 by beefarino |

formathighlightI came across a strange PowerShell feature while working on a StudioShell bug– you’ve gotta see this one for yourself…

Open a shell and run the following one-liner:

get-childitem ~ | foreach-object {
    if( $_ -match '^t' ) { 
        $_ | add-member -mem noteproperty -name writeerrorstream -value $true 
    } 
    $_
}

Take a look at the output.  Assuming you have a file or folder in your home directory that starts with the letter ‘t,’ you should see them in red (or whatever color you’ve configured your error output to be) co-mingling with all those non-t-starting file system thingies! 

WriteErrorStream seems to be a magic property PowerShell uses to format an object as if it were an error.  Works in the console host and the ISE.  Seriously, try it.

Note that I say it formats the object as if it were an error – that is, while it affects the coloring of that individual output record, it is unaffected by your  $ErrorActionPreference setting and doesn’t manipulate the $error magic variable.  It just colors those items. 

While I’m sure this isn’t behavior that one should rely upon, it does seems like a quick & dirty way to mark outputs in the shell.  Here’s a function I just added to my profile:

function format-highlight
{
    [cmdletbinding()]
    param( 
        [parameter(Mandatory=$true, position=0)]
        [scriptblock] 
        $filter,
        
        [parameter(ValueFromPipeline=$true)]
        $input 
    )
    
    process
    {
        $input | foreach { 
            if( &$filter ) { 
                $_ | add-member -mem noteproperty -name writeerrorstream -value $true 
            } 
            $_  
        }
    }
}

That makes using this feature a snap:

ls | format-highlight { $_.length -gt 1mb }


ScriptCS or PowerShell? part 1

§ May 15, 2013 13:02 by beefarino |

TL;DR: Use both: https://github.com/beefarino/ScriptCS-PowerShell-Module

A few days ago I posted a teaser of a project I hammered together that allowed you to run ScriptCS inside of your PowerShell session.  This morning I pushed the cleaned-up version to to GitHub: https://github.com/beefarino/ScriptCS-PowerShell-Module.

I’ve been watching ScriptCS with much interest – I’m a huge fan of scripting and REPL interaction and having another environment to leverage makes me happy.  What makes me sad is when I read crap like this on twitter:

ScriptCS is awesome!  <expletive deleted> PowerShell!

Or this:

PowerShell is amazing and ScriptCS is teh suck!

Why anyone would shun one or the other is beyond me.  They each have a wealth of possibility to offer.  My first instinct with new technology is to figure out how to make it work with other things I already know.  This helps me understand the guts of the new thing and gives me a frame of reference to move forward.  So that’s what I’m doing, and the potential is tantalizing.

Phase One

Phase one was pushed to GitHub this morning.  You can now run ScriptCS code inside of a PowerShell session:

PS > ipmo scriptcs
PS > invoke-scriptcs '"Hello PowerShell!"'
Hello PowerShell!

You can even pipe data to ScriptCS and consume it:

PS > 0..9 | invoke-scriptcs '(int)pscmdlet.Input[0] + 100'
100
101
102
103
104
105
106
107
108
109

And put data from ScriptCS onto the pipe:

PS > 0..9 | `
    invoke-scriptcs '(int)pscmdlet.Input[0] + 100' | `
   %{ "Output: $_" }
Output: 100
Output: 101
Output: 102
Output: 103
Output: 104
Output: 105
Output: 106
Output: 107
Output: 108
Output: 109

There’s still lots to do, but the possibilities are pretty amazing.  Ever wished you could use LINQ from PowerShell?  Now you can:

PS > invoke-scriptcs -input (0..9) -script 'from i in pscmdlet.Input where (int)i > 5 select i'
6
7
8
9

Phase 2

Phase 2 is pretty much the opposite of what I’ve done so far.  Phase two is a Script Pack for ScriptCS that allows you to run arbitrary PowerShell script in ScriptCS.  This is working now – the plan is to clean it up over the next week and make it public.  Stay tuned…



CodeOwls Skunkworks at the AZ PoSh User Group

§ April 30, 2013 09:17 by beefarino |

white-compass-rose-hiTomorrow evening I’ll be giving an online presentation to the Arizona PowerShell User Group.  I’ve heard great things about this group and am really looking forward to the talk – if you want to join the fun check out the details here: http://www.azposh.com/2013/04/maymeeting/.

The topic is “Strange Things with PowerShell”.  Well, that’s the title – the topic is really a set of skunkwork projects I’ve been hammering out since the beginning of the year.  I’ll introduce them briefly in this post, but if you want to learn more you’ll have to tune in tomorrow evening or wait for the release.

The first project I’ll be covering is the Entity Shell (ES).  ES is a PowerShell module that “knows” all about Microsoft’s ORM named Entity Framework.  Basically, once you have an Entity Framework data context defined, you can use PowerShell to manage your entities without doing any extra work.  Here’s a working code snippet to whet your appetite:

# pull in the entity shell
import-module entityshell;

# pull in my entity data context
new-psdrive -name ent -root '' -psp entityprovider `
    -context [SuperAwesomeWebsite.Models.Context]

# create a set of 100 new user accounts for testing
0..99 | new-item -path ent:/users -username { "User$_" } -password { "Password$_" }

# commit the new entities to persistent storage
complete-unitOfWork

# server-side filter for users without a password
$u = dir ent:/users -filter "it.Password IS NULL" 

# generate a report of these users
$u | convertto-csv | out-file "badusers.csv"

# remove the offending users
$u | remove-item

# commit these removals to persistent storage
complete-unitOfWork

The other project I’ll be demoing is Polaris; this project came about as a way to see how simple I could make the process of extending Windows Explorer.  Shell namespace extensions are hard; Polaris makes it dirt simple, and all you need is a little PowerShell to turn Windows Explorer into a rich dashboard of the stuff you want to see.

More on these projects in the coming weeks; in the meantime, if you want to see the goods, you’ll have to tune in (http://www.azposh.com/2013/04/maymeeting/).