Raster to Text OCR Command Line

How to convert raster to text by command line?

VeryDOC Raster to Text OCR Converter Command Line can be used to convert raster image to text. Raster image could be the following image file formats: TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG. Meanwhile this software also can help you convert PDF file including image PDF file to text by command line. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter

  • When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call the executable file in MS Dos Windows.
  • This is Windows application and for now it can not be used under Mac or Linux system.

Step 2. Convert raster to text by command line.

  • When you use this software, please refer to the usage and examples.
  • Here is the usage for your reference:  pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • When converting raster to text, please refer to the following command line templates.
  • pdf2txtocr.exe C:\in.tif C:\out.txt
    pdf2txtocr.exe C:\in.jpg C:\out.txt
    pdf2txtocr.exe C:\in.bmp C:\out.txt
    pdf2txtocr.exe C:\in.png C:\out.txt
    pdf2txtocr.exe C:\in.pcx C:\out.txt
    pdf2txtocr.exe C:\in.tga C:\out.txt
    pdf2txtocr.exe C:\in.pnm C:\out.txt
    pdf2txtocr.exe C:\in.mng C:\out.txt
    When converting raster image files to text, simply input full path of input file and output file and you do not need to add any other parameters. When you need to convert image file to text in batch, please use wild character like the following command line templates.
    pdf2txtocr.exe C:\*.tif C:\*.txt
    In order to improve OCR recognition rate, you can convert image to PDF first as when converting raster to PDF, you can adjust image threshold and rotate image in some degree.
    pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
    pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.tif C:\out.pdf
    Now let us check related parameters:
    -rotate <int>       : when you need to rotate pages before OCR, please add this parameter.
    -threshold <int>    :when you need to adjust lightness threshold that used to convert image to B&W, please add this parameter.
    -ocr                : this parameter will enable OCR function when converting image file scanned PDF file to text or searchable PDF file.
      -ocrmode <int>      : set OCR mode
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

By this function, you can extract text from raster image file to text. Meanwhile you can convert raster image to searchable PDF file. When output PDF file is PDF file, there are many parameter available for you to choose. If you need to check more functions and parameters of this software, please visit its homepage. During the using if you have any question, please contact us as soon as possible. Now let us check the conversion effect from the following snapshot.

input tiff file
                       This is from input tiff file.

output text from PDF
   This is from output text file.

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
PDF Margin Crop

How to crop PDF margins by command line?

     In last article, we talked about how to crop PDF by GUI version software of VeryDOC PDF Margin Crop. In the following part, I will show you how to crop PDF by its command line version. This software either can be used as GUI version software or command line version. And by the command line version, you can call it from Visual Basic, C/C++, Delphi, ASP, PHP, C#, .NET, etc. Please check more information on software homepage and in the following part, I will show you how to use the command line software.

Step 1. Install PDF Margin Crop

  • This software is Window application and it is bundled together with GUI version and command line version. When downloading finishes, there will be an exe file. You need to install this software by double clicking the exe file and following installation message. When installation finishes, there will be icon on desktop. Meanwhile in the installation folder you can find the command line executable file.
  • Or you can click Start then go to installation folder, where you can find command line short cut icon.

Step 2. Crop PDF through command line operation.

  • When you use this software, please refer to the usage and command line parameters.
  • Usage:      pdfmc [options] <pdf-file> [<out-pdf>]
  • When you need to crop PDF, please refer to the following command line templates.
    pdfmc.exe C:\in.pdf C:\out.pdf
    When cropping PDF, if you do not add any parameters, this software will crop PDF according to content automatically.
    pdfmc.exe C:\in\*.pdf C:\out\*.pdf
    By wild character, we can crop PDF in batch.
    pdfmc.exe -linewidth 8 C:\in.pdf C:\out.pdf
    By this command line, we can crop PDF and remove black borders which width less than this value 8.
    pdfmc.exe -linewidth 8 -specklesize 20 C:\in.pdf C:\out.pdf
    By this command line, we can crop PDF and remove the speckles which size less than this value.
    pdfmc.exe -linewidth 0 -specklesize 0 C:\in.pdf C:\out.pdf
    By this kind value 0, we can crop PDF and remove all the lines and speckles.
    for %F in (D:\test\*.pdf) do "pdfmc.exe" "%F" "%~dpnF-out.pdf"
    for /r D:\test %F in (*.pdf) do "pdfmc.exe" "%F" "%~dpnF-out.pdf"
    By above command line, we can crop PDF in batch or write bat files.

Now let us check related parameters:

  -skip                : don't overwrite an output file if it already exists
  -margin <string>     : Set page margins to output PDF file
        -margin 10            : Set margin to 10 pt to left
        -margin 10x10         : Set margin to 10 pt to left,top
        -margin 10x10x10      : Set margin to 10 pt to left,top,right
        -margin 10x10x10x10   : Set margin to 10 pt to left,top,right,bottom
        -margin 10pt          : Set margin to 10 pt to left
        -margin 10x10pt       : Set margin to 10 pt to left,top
        -margin 10x10x10pt    : Set margin to 10 pt to left,top,right
        -margin 10x10x10x10pt : Set margin to 10 pt to left,top,right,bottom
        -margin 10mm          : Set margin to 10 mm to left
        -margin 10x10mm       : Set margin to 10 mm to left,top
        -margin 10x10x10mm    : Set margin to 10 mm to left,top,right
        -margin 10x10x10x10mm : Set margin to 10 mm to left,top,right,bottom
        -margin 10in          : Set margin to 10 inch to left
        -margin 10x10in       : Set margin to 10 inch to left,top
        -margin 10x10x10in    : Set margin to 10 inch to left,top,right
        -margin 10x10x10x10in : Set margin to 10 inch to left,top,right,bottom
  -tempdir <string>  : set a folder to store temporary files
  -linewidth <int>     : remove black borders which width less than this value,default is 8
  -specklesize <int>  : remove the speckles which size less than this value, default is 20

By this software, we can crop PDF easily. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: +1 (from 1 vote)
PDF Margin Crop

How to crop PDF for cutting margin through software interface?

When printing there will be some PDF files with large margins in four directions, so it will be a little waste to print PDF documents with big margins. In this article, I will show you one method of cropping PDF for cutting margins. The software I use is VeryDOC PDF Margin Crop, which either can be used as command line version software or GUI version software. In this article, I will use it as GUI version.

Step 1. Install PDF Margin Crop

  •  Download PDF Margin Crop. This is Windows application, when downloading finishes, there will be an exe file. Please install this software by double clicking the exe file and following installation message till short cut icon showing up on desktop.
  • When you launch this software, please click the icon. The following snapshot is from the software interface, please have a check.

software interface of PDF Margin Crop

Step 2. Crop PDF for cutting margins.

  • Please drag PDF files needed cropping from its containing folder to software interface and then added PDF file will be shown in the file list with detail information like file path, file size and added date.
  • Then do the cropping setting by clicking Setting button. When clicking setting then you will menu options like the following snapshot.

pdf-margin-setting-menu

  • There are three tabs here, the above snapshot is from Basic Options tab. Here you can set saving mode: always ask for output file path, save to original directories with corresponding filenames, and save to following directories.In basic options part, you can set margins cropping size, there are four modes for you to choose.  By clicking button Browse, you can choose temporary folder to save output files.
  • When finishes setting part, please click button OK to back to the main interface.
  • If you need to set other options for output PDF file, please go to other tabs like View Options tab and Password Setting tab.
  • In View Options part, you can set view PDF file after cropping. In the Password Setting tab, you can set password to protect output PDF file from being printing, opening and others.

Now let us check the cropping effect from the following snapshot.

input PDF and output PDF

Checking from the above snapshot, we can feel the margins have been cut. So this software really can help you when you need to crop PDF margins. Meanwhile this software also can be used as command line version with the same functions. In next article, I will publish article about command line usage. If you need, please pay attention to our knowledge.  During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
DOC to Any Converter

How to convert PowerPoint to tiff and set compression mode by command line?

In this article, I will show you how to convert PowerPoint to tiff file and then set compression mode by command line. The software I use is ·VeryDOC DOC to Any Converter, which can be used to convert all the Office file, OpenOffice file to tiff, PDF and other image file formats through command line in batch. In the following part, I will show you how to use this software.

Step 1. Download DOC to Any Converter

  • When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call executable file in MS Dos Windows.
  • When you use this Window application, please obey rules, usage and examples in readme.txt.

Step 2. Convert PowerPoint to Tiff and set compression modes.

  • Here is the usage for your reference: DOC2Any [options] <in-file> [<out-file>]
  • When converting PowerPoint to tiff and set compression mode, please refer to the following command line templates.

    doc2any.exe -useprinter -compression 88880 "C:\in.ppt" C:\out.tif
    By this command line, we can convert PowerPoint to tiff and compress tiff file by 204x98  G4 ClassF TIFF.
    doc2any.exe -useprinter -compression 88881 "C:\in.pptx" C:\out.tif
    By this command line, we can convert PPTX file to tiff using virtual printer and using compression method of 204x196 G4 ClassF TIFF
    doc2any.exe -useprinter -compression 88883 "C:\in.ppt" C:\out.tif
    By this command line, you can compress output tiff file by 204x196 G3 ClassF TIFF.
    doc2any.exe -multipagetif -compression 88880 "C:\in.ppts " C:\out.tif
    By this command line, we can convert PowerPoint to multipage tiff file and compress it by 204x98  G4 ClassF TIFF.
    doc2any.exe -multipagetif -bitcount 1 -xres 300 -yres 300 "C:\in.pptx" C:\out.tif
    By this command line, we can convert PowerPoint to multipage tiff and set bitcount as 1, resolution from X and Y directions.

    Now let us check the parameters related to the conversion:
    -compression <int>       : Set compression for TIFF image
        -compression 1     : NONE compression
        -compression 2     : CCITT modified Huffman RLE
        -compression 3     : CCITT Group 3 fax encoding (1d)
        -compression 4     : CCITT Group 4 fax encoding
        -compression 5     : LZW compression
        -compression 6     : OJPEG compression
        -compression 7     : JPEG DCT compression
        -compression 32773 : PACKBITS compression
        -compression 32809 : THUNDERSCAN compression
        -compression 88880 : 204x98  G4 ClassF TIFF
        -compression 88881 : 204x196 G4 ClassF TIFF
        -compression 88882 : 204x98  G3 ClassF TIFF
        -compression 88883 : 204x196 G3 ClassF TIFF
        -compression 88884 : CCITT Group 3 fax encoding (2d)
      -rotate <int>            : Rotate pages, 90, 180, 270

You can use the same method to convert other Office file and OpenOffice file to Tiff and compress it by above compression methods. When converting Word file to tiff, you do not need to install Office word application. But when conversion other Office file to tiff, please make sure install corresponding Office application.

During the using, if you have any question, please contact us as soon as possible.

     

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
DOC to Any Converter

How to split long HTML and then convert HTML to PDF?

When converting HTML file to other file formats, you may meet some quite long ones which can not be converted fully no matter how hard you try. In this article, I will show you one method of converting long HTML to PDF in specified height. I use software VeryDOC DOC to Any Converter, which also can help convert HTML file to Postscript, PS, EPS, SVG, SWF, XPS, PCL, HPGL, HTML, MHTML, RTF, Text, XML, JPG, TIFF, EMF, WMF, BMP, GIF, PNG, TGA, PCX, etc. formats.

Step 1. Download DOC to Any Converter 

  • This software is Window application which can not work under Mac and Linux. But if you need this software work under Mac and Linux, please contact us then we can make one for you.
  • When downloading finishes, please unzip it to some folder then find the executable file in MS Dos Windows.

Step 2. Split HTML by height and convert HTML to PDF 

  • When you run conversion, please refer to the usage and examples.
  • Usage:   DOC2Any [options] <in-file> [<out-file>]
  • When converting HTML to PDF and split HTML by height, please refer to the following command line templates and parameters.
  • doc2any.exe –height 200 C:\in.html C:\out.pdf
    By this command line, we can convert HTML to PDF and specify height of PDF as 200.
    doc2any.exe –emfheight 150 C:\in.html C:\out.svg
    By this we can split HTML file at height of 150 and then convert HTML to SVG.
    doc2any.exe –pageheight 300 C:\in.html C:\out.tiff
    doc2any.exe –pageh 400 C:\in.html C:\out.jpg
    doc2any.exe –ph 100 C:\in.html C:\out.pdf
    Meanwhile this software also allows you to convert HTML file to other image file formats according to the height split by specified height.
    Now let us check related parameters:
    -width <int>             : Set page width to PDF file
    -height <int>            : Set page height to PDF file
    -emfheight <int>         : Split a long HTML file by height
    -pageheight <int>        : Split a long HTML page by page height, same as -emfheight
    -pageh <int>             : same as -pageheight
    -ph <int>                : same as -pageheight

So by this software, you can split HTML at any place. This function is good when converting long HTML to PDF especially there are some pictures, tables or others in HTML file. If you do not specify height, software will cut HTML in random place, so we can make sure the pictures, tables in it intact.

Meanwhile this software also can help you convert HTML to PDF and specify margins around it. There are some parameters for your reference.
-margin <string>         : Set page margin to PDF file
    -margin 10         : Set margin to 10pt to left
    -margin 10x10      : Set margin to 10pt to left,top
    -margin 10x10x10   : Set margin to 10pt to left,top,right
    -margin 10x10x10x10: Set margin to 10pt to left,top,right,bottom

More functions and parameters, please check them in readme.txt. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)