Latest posts by techwriter (see all)
- How to Avoid Repeating Words in a Headline - April 18, 2017
- Leveraging Multi-Function Printers With Document Imaging Software - April 10, 2017
- Understanding and Effectively Using Document Indexing in a Document Capture Solution - April 5, 2017
© Ugur Akinci
PDF (Portable Document Format) is great since it allows you to read a technical document on any machine, regardless of the operating system.
But if you are doing any structured authoring, single-sourcing, or DITA conversion, then PDF is not good since it is next to impossible to tag the text embedded in a PDF document. You’d rather have plain text as your source material.
Adobe Acrobat Professional
One way to go back from PDF to plain text is of course to save your PDF document with Acrobat as a MS Word file. But for that, you’d need to have Adobe Acrobat Professional. Just plain Adobe Reader cannot do that.
But even with the Pro edition, the results are far from perfect. You’d need to do some reformatting and editing with the end-result, depending on the complexity level of your PDF document. If, for example, you’ve used tables extensively in PDF file for formatting purposes, the result will be unsatisfactory.
Here are three other alternatives of extracting plain text from PDF:
A-PDF Text Extractor
“Users simply seek out the PDF in question by navigating a standard file tree. Next they just click the Extract button. This will ask users for a destination file and not much else. PDFs are converted into .txt files within a matter of seconds.”
PDF Plain Text Extractor
This application promises to preserve the original PDF layout while supporting Type0, Type1, Type3, TrueType, CIDFont fonts.
“PDF Converter is a 6-in-1 PDF utility to convert PDF to Word, Excel, PowerPoint, EPUB, HTML and Text on your computer. Also quickly and efficiently convert PDF to formats compatible with iPhone, iPad, iTouch, iBooks, Sony Reader that you can facilitate eBook reading on the go.”