Raster to Text OCR Command Line

How to convert raster image to searchable PDF and add basic information?

    When you need to convert image to searchable PDF and add basic information, this article will be helpful for you. The software I will use is VeryDOC Raster to Text OCR Converter Command Line, which can be used to recognize the text in many types of image files. More information, please check on software homepage. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

  • This is command line version software, so for uploading and downloading easy consideration, we have compressed it to zip file.
  • Once downloading finishes, please extract it to some folder then you can check its elements in it and call the executable file in MS Dos Windows.

Step 2. Convert raster image to searchable PDF and add basic information.

  • When you use this software, please refer to the usage and examples in readme.txt.
  • Here is the usage for your reference:  pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • When converting raster image to searchable PDF, please refer to the following command line templates.
  • pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 4 –producer VeryDOC C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 3 –creator  LA C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 4 –subject  “This is about conversion” C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 3 –title VeryDOC C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 4 –author ME C:\in.tif C:\out.pdf

By above command line templates, we can convert image file to searchable PDF and add basic information like title, keywords, subject, author and others. Here are parameters for your reference.

-producer <string>  : Set 'producer' to PDF file
-creator <string>   : Set 'creator' to PDF file
-subject <string>   : Set 'subject' to PDF file
-title <string>     : Set 'title' to PDF file
-author <string>    : Set 'author' to PDF file
-keywords <string>  : Set 'keywords' to PDF file

-ocrmode <int>      : set OCR mode
    -ocrmode 0: output to text file
    -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
    -ocrmode 2: output to plain text based PDF file
    -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

The input image could be the following raster image formats: Scanned JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM, tiff  files and so on. Meanwhile by this software, you can also deskew, rotate raster image and then convert them to PDF.  When converting them to PDF, you can also set password to protect PDF.

There are too many functions to be listed here. Please check more on the website, during the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Raster to Text OCR Command Line

Convert scan to text through OCR technology

   When scan paper documents to image, it is easy to upload, transfer. But there is one problem that it is quite hard to extract text from scan file. So it will be hard for us to get information from it. If there is one page of scan file, we can type word from scan file to text. However, if there are thousands of pages, situation will be quite hard to handle.  In this article, I will show you how to convert scan to text through OCR technology.

  I software I use is VeryDOC Raster to Text OCR Converter Command Line, by it we can convert scan file in English, French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish to text. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

  • On website, there are two Licenses: server version and developer version. If you just use this software on simply computer, laptop or server and do not use it for developing, simply choose the server version.
  • When downloading finishes, there will be a zip file. Please extract it to some folder then you can call the executable file in MS Dos Windows.

Step 2. Convert scan to text.

  • When use this software, please refer to the usage and examples.
  • Here is the usage for your reference: Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • Here are some examples for your reference. You can scan file to any one of the below formats like TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG.
  • pdf2txtocr.exe C:\in.tif C:\out.txt
    pdf2txtocr.exe C:\in.jpg C:\out.txt
    pdf2txtocr.exe C:\in.bmp C:\out.txt
    pdf2txtocr.exe C:\in.png C:\out.txt
    When convert those scan file to text, simply input the full path of the scan file and then output text file full path. By this way, you can convert scan file to text directly.

  • When converting tiff file in some other languages except English, please refer to the following command line template.
    pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
    Please add parameter –lang and corresponding languages parameters. This software supports more than 50 OCR languages like French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish, etc. but you need to download corresponding language package on website. Please use the right language symbol like
     
  • Bulgarian bul.zip   Catalan cat.zip   Czech ces.zip  German deu.zip   Greek ell.zip   English  eng.zip  Finish  fin.zip     French fra.zip

    Hungarian hun.zip  Indonesian  ind.zip  Italian  ita.zip  Latvian  lav.zip  Lithuanianlit.zip  Dutch nld.zip

So this software will be your real helpful assistant when you need to extract text from scan file. And there are more parameters of this software, I can not list all of them here. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
PDF to Vector Converter

The pdf2vector application was unable to start correctly (oxc0000005), Click OK to close the application, On 2008 server.

I am having problems running pdf2vector on 2008 server

I downloaded the evaluation version from the web page:

https://www.verydoc.com/pdf2vec_cmd.zip

When I unzip it an run it on Windows Server 2008 64bits R2 I get the following error:

VeryPDF PDF2Vector Converter has stopped working
The application was unable to start correctly (oxc0000005). Click OK to close the application.

image

Best regards,
Customer
----------------------------
Please turn off DEP for "pdf2vec.exe" application to try again, please refer to following steps about how to turn off DEP in your system,

1. Click "Start"
2. Select "Control Panel"
3. Select "System"
4. Click the "Advanced" tab
5. In the "Performance" region select "Settings"
6. Click the "Data Execute" tab in the dialog box that opens
7. Select "Turn on DEP for all programs and services except for those I select"
8. Click "Add"
9. The open dialog box will open. Browse and select "pdf2vec.exe" application in your computer,
10. Click "Open"
11. Click "Apply"
12. Click "Ok"
13. Reboot

OK, you should no problem to run "pdf2vec.exe" now, please give it a try.

VeryDOC

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Raster to Text OCR Command Line

How to extract text from raster image and save them in text?

   In this article, I will show you how to extract text from raster image file and then save them in text file. The extraction could be done in batch by advanced OCR technology. When saving them in text file, you can also add various page number to text file. When operation, you do not need to open input raster image file as the conversion could be done by from MS Dos Windows by command line.

   The method could be fulfilled under the help of software VeryDOC Raster to Text OCR Converter Command Line, which is a professional tool of converting raster image file to text. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

  • The current version of this software is Version: v2.0. And if you use this software just for simply conversion, please download the server version, which allows you to use this software under the whole server.
  • When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call this software from MS Dos Windows.

Step 2. Extract text from raster image file and save it as text document.

  • When you use this software, please refer to the usage and examples.
  • Here is the usage for your reference: Usage:  pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • When you extract text from raster image file, please refer to the following command line templates.
  • pdf2txtocr.exe C:\in.tif C:\out.txt
    By this command line, we can extract text from tiff raster image file. And even if there are many pages in tiff, the extraction could be done fast and accurately.
    pdf2txtocr.exe C:\in.jpg C:\out.txt
    Same with the above command line, by it we can extract text from JPG raster image file.
    pdf2txtocr.exe C:\in.bmp C:\out.txt
    pdf2txtocr.exe C:\in.png C:\out.txt
    When you need to extract text from other raster image file, simply change the input image file formats that would be OK.

    The raster image could be any kind of scan file. If you can scan image to black and white, the extraction effect would be much better.

When do extraction, we often meet some raster image files which are slope, dirty. Those factors will effect conversion effect. In order to fix image, you can process image in advanced by this software. The following parameters are for your reference.
-bitcount <int>     : by this parameter, we can set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8-bit
-rotate <int>       : this parameter can help you rotate pages before OCR.
-threshold <int>    : by this parameter, we can adjust lightness threshold that used to convert image to B&W

If you need to know more functions of this software, please visit homepage of this software. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Raster to Text OCR Command Line

How to convert raster image to searchable PDF file?

    Sometime we need to convert image to PDF, but when converting finishes and checking PDF file, we will find that the PDF is different with others PDF file, which can not be copied and pasted. In order to solve this problem, VeryDOC will  introduce one way of converting raster image to searchable PDF file. The software I used here is VeryDOC Raster to Text OCR Converter Command Line, which also can be used to convert image to text. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

  • Once downloading finishes, there will be a zip file. Please extract it to some folder then you can call the executable file in MS  Dos Windows.
  • There are some help documents, bat file by which you can check conversion effect at once.

Step 2. Convert raster image to searchable PDF file.

  • When you use this software, please read usage and parameter list carefully.
  • Here is the usage for your reference.  Usage:   pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • When converting raster image to searchable PDF, please refer to the following command line templates.
  • pdf2txtocr.exe -ocrmode 1 -threshold 200 -ocr C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 2 -rotate 90 -ocr C:\in.jpg C:\out.pdf
    pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.png C:\out.pdf
    pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.bmp C:\out.pdf
    pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.gif C:\out.pdf
    pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.tga C:\out.pdf

This software provides 5 OCR modes, please check related parameters. Please note do not use -ocrmode 0 as this parameter can help you output TEXT file input image file.
-ocrmode <int>      : set OCR mode
    -ocrmode 0: output to text file
    -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
    -ocrmode 2: output to plain text based PDF file
    -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

Also before processing PDF, you can adjust input image in advance. Say you can rotate input image, adjust image resolution and so on so forth. Here are some parameters for your reference:

-bitcount <int>     : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8-bit
-rotate <int>       : rotate pages before OCR
-threshold <int>    : lightness threshold that used to convert image to B&W
-ocr                : enable OCR function for scanned PDF file

By this software, you can convert most of the raster image file like TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG to searchable PDF file. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)