VeryDOC Knowledge Base

HTML to PDF Converter, HtmlShell (HTMLConverter method) has different behavior on different systems

2013/07/06

Hello,

We are analyzing their component HtmlShell (HTMLConverter method) and our tests we found that there is different behavior from one operating system to another.
We have a Windows Vista 64 in which the component works perfectly.
We have a Windows Vista 32 wherein component did not work.

We have 2 machines with Windows 7 32-bit component and works on only one of them.

We have a machine with Windows 8 in that the component does not work.

We need to know if you have had reported this type of behavior because it seems that something is missing in those systems where the component does not work.

I await

Sincerely,

-----------------------------------------------

Original text:

Olá,
Estamos analisando o seu componente HtmlShell (método HTMLConverter) e nos testes verificamos que existe comportamento diferente de um sistema operacional para outro.
Temos um Windows Vista 64 em que o componente funciona perfeitamente.
Temos um Windows Vista 32 em que o componente n?o funcionou.

Temos 2 máquinas com Windows 7 32 bits e o componente funciona em apenas uma delas.

Temos uma máquina com Windows 8 em que o componente n?o funciona.

Precisamos saber se vocês já tiveram relatado este tipo de comportamento pois nos parece que está faltando algo nesses sistemas em que o componente n?o funciona.

Aguardo,

Atenciosamente,

-----------------------------------------------

Yes, HtmlShell (HTMLConverter method) has the different behavior on different systems, because it is affected by screen resolution and IE versions.

If you wish get the same behavior on all systems, we suggest you may download following products from our website to try,

docPrint Pro v6.0,
http://www.verypdf.com/app/document-converter/try-and-buy.html
http://www.verypdf.com/artprint/docprint_pro_setup.exe

VeryDOC HTMLPrint to Any Converter,
https://www.verydoc.com/htmlprint-to-any.html
https://www.verydoc.com/htmlprint2any_cmd.zip

These products are all can convert HTML files to PDF files, because they are using printing technology to print HTML files to PDF files, so you will get same behavior on all systems.

We suggest you may download the trial version of above products from our website to try, please feel free to let us know if you encounter any problem.

Remark:

htmltools.exe application does render HTML page to Windows Metafile (EMF) first, and convert Windows Metafile (EMF) to PDF file again, the appearance of EMF file maybe changed by Screen Resolution, for example, 1028x768, 800x600, 1600x900 etc. Screen Resolution will create different EMF files.

docPrint Pro v6.0 and “HTMLPrint to Any Converter” are using printing function to create the PDF file, it is same as when you print the HTML file from IE by manual, it is not affected by Screen Resolution.

The speed of htmltools.exe is very fast for simple HTML files, htmltools.exe is not require any virtual printer, it is portable and standalone product, but if your HTML file is contain complicated contents, such as SVG, Flash, Java applet, etc. elements, docPrint Pro v6.0 and “HTMLPrint to Any Converter” will work better for you.

See Also:

HTML to PDF conversion, which software is better for you?
http://www.verypdf.com/wordpress/201205/html-to-pdf-conversion-which-software-is-better-for-you-27560.html

How to Convert a HTML file or Web Pages to PDF file via Command Line?
http://www.verypdf.com/pdfcamp/convert-html-to-pdf.html

How to convert an Office document (DOC, DOCX, XLS, XLSX, PPT, PPTX, etc.) to PDF file via Command Line?
http://www.verypdf.com/document/convert-office-to-pdf.htm

VeryPDF

Keywords: Compare htmltools and docPrint, Metafile, EMF

Raster to Text OCR Command Line

How to convert raster image to searchable PDF and add basic information?

2013/07/052013/07/05

When you need to convert image to searchable PDF and add basic information, this article will be helpful for you. The software I will use is VeryDOC Raster to Text OCR Converter Command Line, which can be used to recognize the text in many types of image files. More information, please check on software homepage. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

This is command line version software, so for uploading and downloading easy consideration, we have compressed it to zip file.
Once downloading finishes, please extract it to some folder then you can check its elements in it and call the executable file in MS Dos Windows.

Step 2. Convert raster image to searchable PDF and add basic information.

When you use this software, please refer to the usage and examples in readme.txt.
Here is the usage for your reference: pdf2txtocr.exe [options] <PDF-file> <Text-file>
When converting raster image to searchable PDF, please refer to the following command line templates.

pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 –producer VeryDOC C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 3 –creator LA C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 –subject “This is about conversion” C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 3 –title VeryDOC C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 –author ME C:\in.tif C:\out.pdf

By above command line templates, we can convert image file to searchable PDF and add basic information like title, keywords, subject, author and others. Here are parameters for your reference.

-producer <string> : Set 'producer' to PDF file
-creator <string>   : Set 'creator' to PDF file
-subject <string>   : Set 'subject' to PDF file
-title <string>     : Set 'title' to PDF file
-author <string>    : Set 'author' to PDF file
-keywords <string> : Set 'keywords' to PDF file

-ocrmode <int>      : set OCR mode
    -ocrmode 0: output to text file
    -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
    -ocrmode 2: output to plain text based PDF file
    -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

The input image could be the following raster image formats: Scanned JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM, tiff files and so on. Meanwhile by this software, you can also deskew, rotate raster image and then convert them to PDF. When converting them to PDF, you can also set password to protect PDF.

There are too many functions to be listed here. Please check more on the website, during the using, if you have any question, please contact us as soon as possible.

Raster to Text OCR Command Line

Convert scan to text through OCR technology

2013/07/042013/07/04

When scan paper documents to image, it is easy to upload, transfer. But there is one problem that it is quite hard to extract text from scan file. So it will be hard for us to get information from it. If there is one page of scan file, we can type word from scan file to text. However, if there are thousands of pages, situation will be quite hard to handle. In this article, I will show you how to convert scan to text through OCR technology.

I software I use is VeryDOC Raster to Text OCR Converter Command Line, by it we can convert scan file in English, French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish to text. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

On website, there are two Licenses: server version and developer version. If you just use this software on simply computer, laptop or server and do not use it for developing, simply choose the server version.
When downloading finishes, there will be a zip file. Please extract it to some folder then you can call the executable file in MS Dos Windows.

Step 2. Convert scan to text.

When use this software, please refer to the usage and examples.
Here is the usage for your reference: Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
Here are some examples for your reference. You can scan file to any one of the below formats like TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG.

pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
When convert those scan file to text, simply input the full path of the scan file and then output text file full path. By this way, you can convert scan file to text directly.

When converting tiff file in some other languages except English, please refer to the following command line template.
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
Please add parameter –lang and corresponding languages parameters. This software supports more than 50 OCR languages like French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish, etc. but you need to download corresponding language package on website. Please use the right language symbol like

Bulgarian bul.zip Catalan cat.zip Czech ces.zip German deu.zip Greek ell.zip English eng.zip Finish fin.zip French fra.zip

Hungarian hun.zip Indonesian ind.zip Italian ita.zip Latvian lav.zip Lithuanianlit.zip Dutch nld.zip

So this software will be your real helpful assistant when you need to extract text from scan file. And there are more parameters of this software, I can not list all of them here. During the using, if you have any question, please contact us as soon as possible.

PDF to Vector Converter

The pdf2vector application was unable to start correctly (oxc0000005), Click OK to close the application, On 2008 server.

2013/07/03

I am having problems running pdf2vector on 2008 server

I downloaded the evaluation version from the web page:

https://www.verydoc.com/pdf2vec_cmd.zip

When I unzip it an run it on Windows Server 2008 64bits R2 I get the following error:

VeryPDF PDF2Vector Converter has stopped working
The application was unable to start correctly (oxc0000005). Click OK to close the application.

Best regards,
Customer
----------------------------
Please turn off DEP for "pdf2vec.exe" application to try again, please refer to following steps about how to turn off DEP in your system,

1. Click "Start"
2. Select "Control Panel"
3. Select "System"
4. Click the "Advanced" tab
5. In the "Performance" region select "Settings"
6. Click the "Data Execute" tab in the dialog box that opens
7. Select "Turn on DEP for all programs and services except for those I select"
8. Click "Add"
9. The open dialog box will open. Browse and select "pdf2vec.exe" application in your computer,
10. Click "Open"
11. Click "Apply"
12. Click "Ok"
13. Reboot

OK, you should no problem to run "pdf2vec.exe" now, please give it a try.

VeryDOC

Raster to Text OCR Command Line

How to extract text from raster image and save them in text?

2013/07/032013/07/03

In this article, I will show you how to extract text from raster image file and then save them in text file. The extraction could be done in batch by advanced OCR technology. When saving them in text file, you can also add various page number to text file. When operation, you do not need to open input raster image file as the conversion could be done by from MS Dos Windows by command line.

The method could be fulfilled under the help of software VeryDOC Raster to Text OCR Converter Command Line, which is a professional tool of converting raster image file to text. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

The current version of this software is Version: v2.0. And if you use this software just for simply conversion, please download the server version, which allows you to use this software under the whole server.
When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call this software from MS Dos Windows.

Step 2. Extract text from raster image file and save it as text document.

When you use this software, please refer to the usage and examples.
Here is the usage for your reference: Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
When you extract text from raster image file, please refer to the following command line templates.

pdf2txtocr.exe C:\in.tif C:\out.txt
By this command line, we can extract text from tiff raster image file. And even if there are many pages in tiff, the extraction could be done fast and accurately.
pdf2txtocr.exe C:\in.jpg C:\out.txt
Same with the above command line, by it we can extract text from JPG raster image file.
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
When you need to extract text from other raster image file, simply change the input image file formats that would be OK.

The raster image could be any kind of scan file. If you can scan image to black and white, the extraction effect would be much better.

When do extraction, we often meet some raster image files which are slope, dirty. Those factors will effect conversion effect. In order to fix image, you can process image in advanced by this software. The following parameters are for your reference.
-bitcount <int>     : by this parameter, we can set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8-bit
-rotate <int>       : this parameter can help you rotate pages before OCR.
-threshold <int>    : by this parameter, we can adjust lightness threshold that used to convert image to B&W

If you need to know more functions of this software, please visit homepage of this software. During the using, if you have any question, please contact us as soon as possible.

M	T	W	T	F	S	S
« Mar
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30