§ September 12, 2016 13:26 by
beefarino |
This post is for friend and fellow nerd Jeff Truman, who asked me today:
Why yes, I do happen to have an example of doing just that thing. Here's the code I use to convert folders of docs and pdfs to raw txt files:
$word = new-object -com 'word.application';
$textformat = 2;
'pdf','doc' | foreach {
$type = $_;
ls ./$type | foreach {
$doc = $word.documents.open( $_.fullname );
$doc.saveas(
($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),
$textformat
);
$doc.close();
$doc = $null;
}
}
$word.quit();
$word = $null;
The script makes some assumptions:
- PDFs and DOC files are in unique directories named PDF and DOC respectively;
- the files are converted to TXT format and saved in a folder named TXT
In a nutshell, the script uses COM to automate the Word application. It makes Word iteratively load the source documents and save them in a text format in the new location. The location is derived from the original document path using this little convoluted bit of code:
($_.fullname -replace "\\$type\\", '\txt\' -replace '\.[^\.]+$', '.txt'),
This line takes the full path of the source file ($_.fullname), replaces the source directory with the TXT directory, and also replaces the original file extension with a .txt extension.
Other than that, it's your basic Word automation.
Enjoy!