Latest posts by techwriter (see all)
- What is the Readability Index of Your Writing? - November 20, 2017
- Should Technical Writing be Boring? And if Yes, Why? - November 15, 2017
- How to Create a Custom-Designed Header in MS Word that Would be Available to All Other Word Documents - November 13, 2017
© Ugur Akinci
PDF (Portable Document Format) is great since it allows you to read a technical document on any machine, regardless of the operating system.
But if you are doing any structured authoring, single-sourcing, or DITA conversion, then PDF is not good since it is next to impossible to tag the text embedded in a PDF document. You’d rather have plain text as your source material.
Adobe Acrobat Professional
One way to go back from PDF to plain text is of course to save your PDF document with Acrobat as a MS Word file. But for that, you’d need to have Adobe Acrobat Professional. Just plain Adobe Reader cannot do that.
But even with the Pro edition, the results are far from perfect. You’d need to do some reformatting and editing with the end-result, depending on the complexity level of your PDF document. If, for example, you’ve used tables extensively in PDF file for formatting purposes, the result will be unsatisfactory.
Here are three other alternatives of extracting plain text from PDF:
A-PDF Text Extractor
“Users simply seek out the PDF in question by navigating a standard file tree. Next they just click the Extract button. This will ask users for a destination file and not much else. PDFs are converted into .txt files within a matter of seconds.”
PDF Plain Text Extractor
This application promises to preserve the original PDF layout while supporting Type0, Type1, Type3, TrueType, CIDFont fonts.
“PDF Converter is a 6-in-1 PDF utility to convert PDF to Word, Excel, PowerPoint, EPUB, HTML and Text on your computer. Also quickly and efficiently convert PDF to formats compatible with iPhone, iPad, iTouch, iBooks, Sony Reader that you can facilitate eBook reading on the go.”