VeryDOC Raster to Text OCR Converter Command Line can be used to convert raster image to text. Raster image could be the following image file formats: TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG. Meanwhile this software also can help you convert PDF file including image PDF file to text by command line. In the following part, I will show you how to use this software.
Step 1. Download Raster to Text OCR Converter
- When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call the executable file in MS Dos Windows.
- This is Windows application and for now it can not be used under Mac or Linux system.
Step 2. Convert raster to text by command line.
- When you use this software, please refer to the usage and examples.
- Here is the usage for your reference: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- When converting raster to text, please refer to the following command line templates.
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
pdf2txtocr.exe C:\in.pcx C:\out.txt
pdf2txtocr.exe C:\in.tga C:\out.txt
pdf2txtocr.exe C:\in.pnm C:\out.txt
pdf2txtocr.exe C:\in.mng C:\out.txt
When converting raster image files to text, simply input full path of input file and output file and you do not need to add any other parameters. When you need to convert image file to text in batch, please use wild character like the following command line templates.
pdf2txtocr.exe C:\*.tif C:\*.txt
In order to improve OCR recognition rate, you can convert image to PDF first as when converting raster to PDF, you can adjust image threshold and rotate image in some degree.
pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.tif C:\out.pdf
Now let us check related parameters:
-rotate <int> : when you need to rotate pages before OCR, please add this parameter.
-threshold <int> :when you need to adjust lightness threshold that used to convert image to B&W, please add this parameter.
-ocr : this parameter will enable OCR function when converting image file scanned PDF file to text or searchable PDF file.
-ocrmode <int> : set OCR mode
-ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
By this function, you can extract text from raster image file to text. Meanwhile you can convert raster image to searchable PDF file. When output PDF file is PDF file, there are many parameter available for you to choose. If you need to check more functions and parameters of this software, please visit its homepage. During the using if you have any question, please contact us as soon as possible. Now let us check the conversion effect from the following snapshot.
This is from output text file.