In this article, I will show you how to extract text from raster image file and then save them in text file. The extraction could be done in batch by advanced OCR technology. When saving them in text file, you can also add various page number to text file. When operation, you do not need to open input raster image file as the conversion could be done by from MS Dos Windows by command line.
The method could be fulfilled under the help of software VeryDOC Raster to Text OCR Converter Command Line, which is a professional tool of converting raster image file to text. In the following part, I will show you how to use this software.
Step 1. Download Raster to Text OCR Converter Command Line
- The current version of this software is Version: v2.0. And if you use this software just for simply conversion, please download the server version, which allows you to use this software under the whole server.
- When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call this software from MS Dos Windows.
Step 2. Extract text from raster image file and save it as text document.
- When you use this software, please refer to the usage and examples.
- Here is the usage for your reference: Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- When you extract text from raster image file, please refer to the following command line templates.
pdf2txtocr.exe C:\in.tif C:\out.txt
By this command line, we can extract text from tiff raster image file. And even if there are many pages in tiff, the extraction could be done fast and accurately.
pdf2txtocr.exe C:\in.jpg C:\out.txt
Same with the above command line, by it we can extract text from JPG raster image file.
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
When you need to extract text from other raster image file, simply change the input image file formats that would be OK.
The raster image could be any kind of scan file. If you can scan image to black and white, the extraction effect would be much better.
When do extraction, we often meet some raster image files which are slope, dirty. Those factors will effect conversion effect. In order to fix image, you can process image in advanced by this software. The following parameters are for your reference.
-bitcount <int> : by this parameter, we can set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8-bit
-rotate <int> : this parameter can help you rotate pages before OCR.
-threshold <int> : by this parameter, we can adjust lightness threshold that used to convert image to B&W
If you need to know more functions of this software, please visit homepage of this software. During the using, if you have any question, please contact us as soon as possible.