PDF Compressor

How to compress PDF in a whole?

Question: I'm working on a tool that will be writing PDF and am trying to find a way to compress the objects and streams in the PDF. A number of the PDF that I'm generating are fairly large, but can be substantially reduced by compressing the objects (or most of the PDF structure) into a flat stream. I swear I've seen this done before, but none of the PDF that I've looked at seem to do it. I also tried using Acrobat X to compress it with "entire file compression", but it seems to only compress the streams. I've tried using ObjStm, but it doesn't have a lot of support from other file readers. I hope I can find solution on VeryDOC.

Answer:In PDF you can have 2 types of compression:

  1. stream compression - the data is compressed using various methods, but the PDF file structure is not compressed.
  2. object compression - you also compress the file structure, mainly the objects that do not include streams.

These are the only supported compression scenarios in PDF. Selecting the right compression method depends much on the data you want to compress: for page content streams usually Flate compression is used, 1bpp images use CCITT G4 or better JBIG2, color images are better compressed with JPEG2000, etc. 

And there is one software on VeryDOC, maybe you can have a try: VeryDOC PDF Compressor. By this software, you can reduce PDF file size up to 40-95% by optimization technology for PDF file compression. It optimizes PDF structures and compresses pictures, graphics and objects within a PDF file while preserving the original file format and quality.  This software will compress stream and objects automatically.

Meanwhile, there is also command line version, server version and SDK version for you to choose.By the SDK version, you can   custom applications (majority of programming languages are supported: C#, C++, Delphi, Visual BASIC, VB.net, etc)

Now let us check some of parameters of this software, maybe you can feel some compress method this software uses.
-mi <string> : Set Monochrome Image Compression, values: jbig2, jbig2l, fax, zip, rle
-midown : Downsample Monochrome images
-midownres <int> : Set Monochrome Image Resolution
-midowntype <int>: Downsample type for Monochrome images:
    -midowntype 0: default
    -midowntype 1: Subsample
    -midowntype 2: Average
    -midowntype 3: Bicubic
-owner <string>: Owner password to use for encrypting output PDF file
-user <string> : User password to use for encrypting output PDF file
-perms <int>   : PDF security permissions to use for encrypting output file
-keylen <int>  : Defines the length (in bits) of the encryption key.
-winfont       : Use Windows fonts to replace Base14 fonts
-embedallfonts : Embed all fonts
-subsetfonts   : Subset fonts
-compressfonts : Compress fonts
-pwdinpdf <string> : Open password for input PDF file
-pdfa          : Create PDF/A file
-nobookmarks   : Don't generate bookmarks in output PDF file
-jbig2 : Compress monochrome image streams with JBIG2 arithmetic
-jpx   : Compress color and grayscale image streams with JPEG2000 arithmetic
-jpxquality <string> : Set Quality for JPX Compression, from 0.0 to 100.0, default is 0.5

If you need to know more about this software, please go to software homepage. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
PDF Viewer OCX Control

How to display PDF in browsers without Acrobat plug-in installed?

Question: Our intranet site has links to a lot of PDF files. But most of our users don't have PDF plug-in installed. So they couldn't see any of these files.Installing PDF plug-in for the browsers in their machines has been ruled out. we don't know why, but it might be some security reasons. We have been asked to convert the entire list of PDF to look-alike html files.

Now I am looking for these options.

  1. Find a software that perfectly mimics the PDF to the corresponding html files.(all kinds of PDF are there, like user manuals,product catalogs, statistical reports etc).
  2. Somewhat write a code to display PDF inside a browser. May be like scribed. I think scribed is using flash. I don't think Our intranet users allowed to install flash plug-in either. If this can be done in .NET, it is much preferred.
  3. Manually convert all those PDF (around 200 files,they will be adding more) to HTML files.

Answer: According to your needs, maybe you can have a free trial of this software: VeryDOC PDF Viewer OCX Control, by which you can build a customer interface for viewing PDF documents from .NET Visual Basic, VC, Delphi, C#, HTML (Internet Explorer) or any other programming languages without Adobe Reader installed.   By this software, you can also optimize PDF for fast web viewing.

1. This software can help you display PDF of all kinds of versions but it can not be used to convert PDF to HTML or other files.
2. When you use this software, you do not need to install any kinds of flash plug-in. Meanwhile you can call it from .NET.
3. When you need to convert PDF to HTML, please have a free trial of this software: VeryDOC PDF to HTML Converter

This software provides the following methods to display PDF.

1. BOOL OpenPDF(LPCTSTR lpszPDFFile, ...)    
2. ClosePDF()     14. void ZoomActualPage()
Description: Open and close PDF Viewer window.
      16. void Zoom(float nZoom)
3. void SetFindText(LPCTSTR lpszFindText)    
4. void FindNextText()     18. void ZoomOut()
5. void FindPreviousText()     Description: Zoom PDF pages.
Description: Search text string in PDF pages.    
      19. void ViewModeSinglePage()
6. void RotateViewLeft()    
7. void RotateViewRight()    
Description: Rotate PDF pages.    
      23. void SetViewMode(long nViewMode)
8. void ViewNextPage()     Description: View PDF pages in different modes.
9. void ViewPreviousPage()    
10. void ViewFirstPage()    
11. void ViewLastPage()    
12. void ViewPage()
13. void ZoomFitPage()
14. void ZoomActualPage()
15. void ZoomFitWidth()

Here are just a part of its methods, please check more usage on the homepage. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
HTML Converter

Is there a way to programmatically remove all blank pages from a PDF file?

Question: Nowadays it is more practical to purchase an eBook than the dead-tree version. But the PDF frequently contain the blank pages used by the print edition. I typically see between 10-30 blank pages (or pages with text "This page intentionally left blank.") per eBook. Is it possible to programmatically remove these blank pages?

    So the hard part is identifying the blank pages. pdftotext would work for the most part, except where the page has only images and no text.Also, even after removing many pages and seeing the resulting file size is smaller, after shrinking both the original file and the new version (using various methods found on the internets), the original file is usually smaller by several hundred KB or more. So it appears the method I'm using to remove the blank pages doesn't create an optimal PDF. I've also tried various programs and see the same results in this respect.

Answer: I don't know of an open source free solution that can detect and remove blank pages. However, VeryDOC commercial HTML Converter can automatically remove blank pages -- both vector and scanned. For scanned, it can remove scan artifacts such as black edges, hole punches and noise prior to determining if page is blank. And during removing all blank pages from PDF, this software will not damage or compress input PDF file. You can have a free trial of HTML Converter and then decide whether pay for it or not. In the following part, I will show you how to use this software.

  There are two versions of this software: GUI version and command line version. But for removing all blank pages from PDF, we’d better use the command line version. When downloading finishes, there will be a zip file. Please extract it to some folder then you can find the executable file and call it from MS Dos Windows. This software also allows you to use it together with ASP, VB, VC, Delphi, BCB, Java, .NET and COM+ etc., so you can use it programmatically.

Here is the usage for your reference: htmltools [options] <EMF-WMF-HTML-URL-RTF-file> [<PDF-PS-Image-file>]

When you need to remove all blank pages from PDF, please refer to the following command line template:
htmltools.exe -noempty -mergepdf C:\test.pdf C:\out.pdf
Please call this software in MS Dos Windows and then input parameters -noempty –mergepdf  then the full path of input PDF and output PDF file. By this method, we can remove all blank pages from PDF. Now let us check related parameters:
-noempty           : Delete empty pages from PDF file
-mergepdf <string> : Merge two PDF files into one PDF file

If you need to know more parameters and functions of this software, please visit its homepage. Now we can use those command line templates to remove all blank pages from PDF. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: -1 (from 1 vote)
PostScript to Text Converter

Problem when call PS to Text SDK from C# source code

I have a simple text file that I printed which then converts the files into Postscript. I did this from Texpad, Wordpad, and from IExplorer. While converting the Postscript from Textpad, and Wordpad -11 is returned from ps2txtsdk and the file is not converted. When I print the same file from Internet Explorer it ps2txtsd returns 11, the the file is converted.

I have included these files for your review. Please let me know why this happens. We are trying to deploy an app for our clients where they can print anything to our (pseudo) printer and it will print to a postscript file, then we convert it to text using your SDK and transfer it to an FTP site. We have no control over what application they are going to use and we need to address a fix for this.

Customer

-----------------------------------------------------------

Return 11 is indicate conversion successful, please refer to the return codes at below,
//~~~~~~~~~~~~~~~~~~~
//0: - Success not found any problem.
//1: - Couldn't open PDF file, this PDF file maybe contains an open password or be damaged
//2: - Couldn't open output text file
//10: - Success with found some embed fonts.
//11: - Success with have only embed fonts in PDF file.
//12: - Fail with empty text file such as this PDF is picture scan or PDF have only picture.
//-1: - Fail with other causes (can not convert to text file) such as found exception, time out, not enough memory.
//-11: - Fail with something is wrong in input parameters,
//~~~~~~~~~~~~~~~~~~~

VeryDOC

-----------------------------------------------------------

That doesn't help at all. We printed a small text file using different applications. We used textoad, wordpad and iexplorer. We used only 2 arguments to the app and that was the ps file to convert and the text file to put the results in. I have already sent a zip file with the original text as well as the ps files created by the above applications I mentioned. The ps files produced by all of the applications look good. Your converter does not convert 2 of these files.

Giving me return values doesn't help. I already know I get -11 on two conversions and 11 on one. I don't understand how your reply is going to help.

I need your app to covert all the ps files especially when the original file is purely text.

Customer

-----------------------------------------------------------

Our PS2TXT SDK does convert your PS files to text files without any problem, please look at converted text files in attachment, these text files are all converted by PS2TXT SDK product. Can you get these text files in your system?

VeryDOC

-----------------------------------------------------------

Thanks. It did work.

For your information:

The difference between my application now and before is simply the declaration in C# for the external

The declaration was
[DllImport("ps2txtsdk.dll", EntryPoint = "VeryPDF_PSToText")]
private static extern Int32 VeryPDF_PSToText(string strcmd);

At this point some files would convert but others would not.

The declaration now is
[DllImport("ps2txtsdk.dll")]
internal static extern int VeryPDF_PSToText(string strcmd);

It converts all the test files. I don’t understand why the small declaration difference would cause this. I will do more extensive testing, and if I have a problem I will get back to you.

It would be helpful if you had some documentation that was sent out along with the SDK.

[DllImport("ps2txtsdk.dll")]
internal static extern int VeryPDF_PSToText(string strcmd);

static void Main(string[] args)
{
string sourcePath = Environment.CurrentDirectory;
string[] files = System.IO.Directory.GetFiles(sourcePath, "*.ps", SearchOption.TopDirectoryOnly);

// convert all files from *.ps to *.txt.
foreach (string ps in files)
{
var psFileName = System.IO.Path.GetFileNameWithoutExtension(ps);
var txt = sourcePath + @"\" + psFileName + ".txt";
string strCmd = "ps2txt -$ XXXXXXXXXXXXXXX " + ps + " " + txt;
long nRet = VeryPDF_PSToText(strCmd);
Console.WriteLine("Conversion Result: " + nRet.ToString());
}
}

-----------------------------------------------------------

private static extern Int32 VeryPDF_PSToText(string strcmd);
internal static extern int VeryPDF_PSToText(string strcmd);

The difference is int32 and int two data types,
-----------------------
Int16 is a short integer,
Int32 is a normal integer, in general, int is equal to int32 on 32bit system, however, on 64bit system, int is 64bit length, int32 is still 32bit length,
Int 64 is a double,

Capacities as follows

Int 16 -- (-32768 to +32787)
Int 32 -- (-2,147,483684 to +2,147,483683)
Int 64 -- (-9223372036854775808 to +9223372036854775807)

decide which one you need to use depending on the circumstances
-----------------------

Please look at following web pages for more information,

http://blog.risingperfection.com/2011/04/c-difference-between-int-and-int32.html
http://stackoverflow.com/questions/62503/c-int-or-int32-should-i-care

You can double check the length of int and int32 in your system carefully.

VeryDOC

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
DOC to Any Converter

How to call Doc to Any Converter SDK from ASP.NET source code?

Hi,

I am using trail version DOC2Any dll in my Asp.net Application. But I am getting problem in conversion some time .docx to PDF.And also when this type issue will be come that time Application will stuck even this service not stop after 24 hours. I am manually stop this service from task manger. Please suggest to me if this type issue will come how to resolve this issue.

After successfully completion of this I move for purchase option. Please give your response as soon as possible with all details. I am also copy that code whatever I am using.

Product Details: Trial Version
https://www.verydoc.com/doc-to-any.html
DOC to Any Converter SDK/COM Version
1 Server License USD$395

Code:
[DllImport(@"pdfshell.dll", CharSet = CharSet.Auto)]
static extern uint DocToAnyRunCmd([MarshalAs(UnmanagedType.LPStr)] string strCmdLine);
public void ConvertFile()
{
    string strCmd;
    strCmd = "-$ XXXXXXXXXXXXXXXXXXXX " + objCommonData.StrFileName + " " +     strOutFile + "";
    uint nRet = DocToAnyRunCmd(strCmd);

    if (nRet != 0)
    {
        // objCommonData.StatusPDFConverter = true;
    }
}

Customer

---------------------------------------------------------------

Yes, our DOC to Any Converter does convert DOCX files to PDF files. Please notice, you need install MS Office 2007 or MS Office 2010 in order to convert DOCX files to PDF files, what version of MS Office installed in your system?

You need also set MS Word DCOM run inside an interactive user account instead of default system user account, please look at following web pages for more information,
https://www.verydoc.com/blog/cannot-test-doc2any-on-net.html
https://www.verydoc.com/blog/how-to-call-doc2any-exe-from-asp-code.html
https://www.verydoc.com/doc-to-any-shell.html
https://www.verydoc.com/blog/run-doc2any-on-windows-2003-or-windows-2008-system.html
http://www.verypdf.com/wordpress/201201/how-to-call-doc2any-exe-or-htmltools-exe-from-a-service-20896.html
https://www.verydoc.com/blog/how-to-call-doc2any-from-php.html
http://www.verypdf.com/wordpress/201107/the-info-from-doc-converter-not-working-1016.html
http://www.verypdf.com/wordpress/201106/combine-word-doc-files-into-one-pdf-file-864.html
http://www.verypdf.com/wordpress/201107/rtf-to-pdf-1051.html

VeryDOC

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)