Converting PDF to Word [Archive] - Glock Talk

PDA

View Full Version : Converting PDF to Word


eisman
10-13-2004, 18:03
As always I'm turning to the guys who know best.

I had a program that would convert PDF's to Word files. It's gone. I need another. Anyone have a viable download?

NetNinja
10-13-2004, 18:57
Google me this, Google me that.

http://www.scansoft.com/pdfconverter/

NRA_guy
10-13-2004, 21:15
I have ScanSoft. It will not convert a pdf file to Word if the pdf file was saved as an image.

And if someone simply scanned a document, it's probably saved as an image. You never know.

The person who scans it into Adobe can run "paper capture" in Adobe, and save the captured document. Then the text is saved as a text layer and ScanSoft will convert it to a Word document. It does pretty good. But the paper capture sometimes confuses "1" with "l" or "0" with "O" and shifts fonts on you.

But to me ScanSoft is no better than simply blocking text and copying it to the clipboard in Adobe and pasting it into Word. ScanSoft may retain some formatting. I can't remember.

You cannot easily tell if the pdf file was stored as an image or as a text layer.

Adobe says to open the document in Adobe and go File--->Document Properties--->Font

If you see fonts listed, it is captured as text and can be blocked and copied. This also means that it can be converted using ScanSoft.

I run the full retail boxed version of Adobe, not the free reader. Not sure how the free reader works.

PS: I hate Adobe. They give you Word--->Adobe conversion, but not Adobe--->Word conversion. I have heard that Word 2003 gives you Adobe--->Word conversion, and their Word 2003 implies this, but some who have it tell me that's not true.

NRA_guy

Warrior2k3
10-13-2004, 21:19
I just use the text selection tool on the PDF file, Select all text, Control C to copy then Control V on the blank word doc to paste. Some formatting will be lost but all the text will be there.

Anon1
10-14-2004, 07:12
Originally posted by NRA_guy
...PS: I hate Adobe. They give you Word--->Adobe conversion, but not Adobe--->Word conversion. ...NRA_guy


Your hatred is misdirected!! Hate Microsoft instead.

The reason that Adobe cannot readily import a Word file is because *Microsoft* (MS) keeps their file formats a closed secret. Whereas other companies open up the internal workings of their file formats so that others can create compatible programs, Microsoft does not.

There are other open-source applications like OpenOffice.org which tries to be compatible with MS file formats but that part of the program code has had to be kind of hacked together because of the problems with reverse-engineering the closed MS formats.

NRA_guy
10-14-2004, 16:49
Originally posted by Anon1
Your hatred is misdirected!! Hate Microsoft instead.

The reason that Adobe cannot readily import a Word file is because *Microsoft* (MS) keeps their file formats a closed secret. Whereas other companies open up the internal workings of their file formats so that others can create compatible programs, Microsoft does not.

There are other open-source applications like OpenOffice.org which tries to be compatible with MS file formats but that part of the program code has had to be kind of hacked together because of the problems with reverse-engineering the closed MS formats. Oh, I hate Microsoft equally. Maybe more.

I'm no computer expert but as I understand, he was asking about converting a pdf (Adobe) file to a doc (Word) file---not about importing a Word file into Adobe.

Actually, Adobe installs icons on the Word toolbar that enable one to readily create a pdf file from a Word file. Eliminating the icons is most folk's concern because we hardly ever do that.

But we frequentlyneedto go the opposite way.

I needed to convert pdf files to doc files often enough that I bought the $600 ScanSoft OmniPage 14 Office that claims "Turn PDF files into editable documents while retaining their layout ".

Well, yes and no.

On some pdf files you get “ScanSoft PDF Converter cannot process this file because the first page does not contain a text layer”

The web link for more information gives the following explanation:

Problem:

An error will occur when converting a PDF file that does not contain a text layer on the first page. The error message states “ScanSoft PDF Converter cannot process this file because the first page does not contain a text layer.”

Cause:

When you open a PDF, whose first page has no text layer, it is assumed the whole document is an image-only PDF file.

Then it balks.

NRA_guy

Toyman
10-15-2004, 07:06
Never used it, but here it is:

http://www.verypdf.com/pdf2word/index.html

PDF2Word(pdf to word) software enable export the text, images and other contents from pdf document into word document, so you can reuse your PDF content, pdf to word software will preserving text, layout and bitmap images in the generated word document.

NRA_guy
10-16-2004, 10:54
Originally posted by Toyman
Never used it, but here it is:

http://www.verypdf.com/pdf2word/index.html

Yeah . . . but. As I read it, it does pretty much the same thing that ScanSoft does, granted at a cheaper price. (ScanSoft balks if the first page contains no text.)

Problem, as I said before, is that many pdf files are images---not text, even though they look like text and print like text---they are only graphic images of the page.

When any of these pdf to doc conversion programs try to convert pdf to doc, they don't perform an OCR interpretation of text in a graphic image.

Some pdf files contain the text as a layer, and that is the only kind of pdf file that these conversion programs can convert.

It all depends upon how the pdf file was generated. You cannot tell simply by opening the file in Adobe.

By the way, as I understand, pdf (poratble document format) is not an Adobe exclusive and there are a number of different pdf formats. Not all pdf files are the same.

I sometimes resort to printing out pdf files and scanning them from the hardcopy with OCR (optical character reader) software in order to avoid re-typing the text. A pain, but not as much as re-typing.

Of course, OCRs try to interpret each letter. Sometimes it gets them wrong.

I would like to see somebody write a conversion program that can do OCR conversion from text graphic images without going through the "print hardcopy-->scan-->read with OCR" process.

Again, I'm no expert by any means.

NRA_guy