A page scanned in older versions of Acrobat, or one created from a photo or drawing, is only an image of a page, and you can’t manipulate its content by extracting images or modifying the text. However, Acrobat can convert the image of the document into actual text or add a text layer to the document by using optical character recognition (OCR).
CAUTION
Be sure to evaluate the captured document when the OCR process is complete, to make sure that Acrobat interpreted the content correctly. It’s easy to confuse a bitmap that may be the letter I with the number 1, for example.
Basic Conversion
To capture the content of an image document, follow these steps:
- Choose Document > OCR Text Recognition > Recognize Text Using OCR. The Recognize Text dialog box opens. Specify whether you want to capture the current page, an entire document, or specified pages in a multipage document.
- Click the Edit button to open the Recognize Text – Settings dialog box. Choose one of three options in the PDF Output Style pop-up menu:
- Searchable Image compresses the foreground and places the searchable text behind the image. Note that compressing affects the image quality.
- Searchable Image (Exact) keeps the foreground of the page intact and places the searchable text behind the image.
- ClearScan rebuilds the page, converting the content into text, fonts, and graphics.
- If you selected either the Searchable Image or the ClearScan OCR choice, choose one of four options from the Downsample Images pop-up menu—anywhere from 600 dpi down to 72 dpi. (Downsampling reduces file size, but also can result in unusable images.) Click OK to return to the Recognize Text dialog box.
- Click OK to start the capture process. Be patient. Depending on the size and complexity of the document, the process can take a minute or two. When the process is complete, the dialog box closes and the results of the conversion are shown in the document.
The point of OCR is to produce searchable text in your document. OCR isn’t foolproof, and you’re going to have some errors, even though Acrobat doesn’t recognize them as such. (See the next section for details on handling suspect content.)
Rounding Up the Suspects
Converting a bitmap of letters and numbers into actual letters and numbers may result in items that can’t be identified definitively, known as suspects. Here’s how to fix the problem:
- Select Document > Recognize Text Using OCR > Find First OCR Suspect to open a dialog box in which Acrobat identifies suspect characters for you to confirm.
- Work through the suspects using several options:
- Select the text in the Suspect field and type the correct letters.
- Click Not Text when the suspect isn’t a word at all.
- Click Find Next to go to the next suspect.
- Click Accept and Find to confirm the interpretation and go to the next suspect.
- Click Close to end the process.
Depending on the characteristics of the document’s text, you may have to modify some conversion results, such as the font or character spacing, by using the TouchUp Text tool.
Do You Have to Convert a Page?
The answer is: It depends. Why are you scanning the page into Acrobat in the first place? Do you need a visual image of a document to put into storage, or to use as part of your customer service information package? For either of these purposes, you probably don’t have to convert the content. Here are some reasons you’d need to convert content from an image PDF to text and images:
- You need to be able to search the text, as within a document collection.
- You want to make the content available to people who use a screen reader or other assistive device.
- You want to repurpose the content for different output, such as a web page or a text document.
- You want to reuse or change the content by moving paragraphs, making corrections, or extracting tables.
TIP
If you’re scanning a document in Acrobat 9, creating searchable text is a default part of the scanning process.
russian language schools