When scan paper documents to image, it is easy to upload, transfer. But there is one problem that it is quite hard to extract text from scan file. So it will be hard for us to get information from it. If there is one page of scan file, we can type word from scan file to text. However, if there are thousands of pages, situation will be quite hard to handle. In this article, I will show you how to convert scan to text through OCR technology.
I software I use is VeryDOC Raster to Text OCR Converter Command Line, by it we can convert scan file in English, French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish to text. In the following part, I will show you how to use this software.
Step 1. Download Raster to Text OCR Converter Command Line
- On website, there are two Licenses: server version and developer version. If you just use this software on simply computer, laptop or server and do not use it for developing, simply choose the server version.
- When downloading finishes, there will be a zip file. Please extract it to some folder then you can call the executable file in MS Dos Windows.
Step 2. Convert scan to text.
- When use this software, please refer to the usage and examples.
- Here is the usage for your reference: Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- Here are some examples for your reference. You can scan file to any one of the below formats like TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG.
- When converting tiff file in some other languages except English, please refer to the following command line template.
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
Please add parameter –lang and corresponding languages parameters. This software supports more than 50 OCR languages like French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish, etc. but you need to download corresponding language package on website. Please use the right language symbol like
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
When convert those scan file to text, simply input the full path of the scan file and then output text file full path. By this way, you can convert scan file to text directly.
Bulgarian bul.zip Catalan cat.zip Czech ces.zip German deu.zip Greek ell.zip English eng.zip Finish fin.zip French fra.zip
Hungarian hun.zip Indonesian ind.zip Italian ita.zip Latvian lav.zip Lithuanianlit.zip Dutch nld.zip
So this software will be your real helpful assistant when you need to extract text from scan file. And there are more parameters of this software, I can not list all of them here. During the using, if you have any question, please contact us as soon as possible.