When you need to convert image to searchable PDF and add basic information, this article will be helpful for you. The software I will use is VeryDOC Raster to Text OCR Converter Command Line, which can be used to recognize the text in many types of image files. More information, please check on software homepage. In the following part, I will show you how to use this software.
Step 1. Download Raster to Text OCR Converter Command Line
- This is command line version software, so for uploading and downloading easy consideration, we have compressed it to zip file.
- Once downloading finishes, please extract it to some folder then you can check its elements in it and call the executable file in MS Dos Windows.
Step 2. Convert raster image to searchable PDF and add basic information.
- When you use this software, please refer to the usage and examples in readme.txt.
- Here is the usage for your reference: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- When converting raster image to searchable PDF, please refer to the following command line templates.
pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 –producer VeryDOC C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 3 –creator LA C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 –subject “This is about conversion” C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 3 –title VeryDOC C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 –author ME C:\in.tif C:\out.pdf
By above command line templates, we can convert image file to searchable PDF and add basic information like title, keywords, subject, author and others. Here are parameters for your reference.
-producer <string> : Set 'producer' to PDF file
-creator <string> : Set 'creator' to PDF file
-subject <string> : Set 'subject' to PDF file
-title <string> : Set 'title' to PDF file
-author <string> : Set 'author' to PDF file
-keywords <string> : Set 'keywords' to PDF file
-ocrmode <int> : set OCR mode
-ocrmode 0: output to text file
-ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
-ocrmode 2: output to plain text based PDF file
-ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
-ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
The input image could be the following raster image formats: Scanned JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM, tiff files and so on. Meanwhile by this software, you can also deskew, rotate raster image and then convert them to PDF. When converting them to PDF, you can also set password to protect PDF.
There are too many functions to be listed here. Please check more on the website, during the using, if you have any question, please contact us as soon as possible.