Forum moved here!

Home / Is it possible to use Sumatra as an API?

stianda

Hello.

I’m creating a program in Ruby to analyze several GB of pdf’s and I only need two data:

  • Validate the pdf file, return “error” if the pdf is corrupted
  • Number of pages the pdf has
    Is it possible to use Sumatra for this? For example Ruby using cmd calls Sumatra.exe asking it to return two pieces of data and Sumatra responds True/995.

I have used some gems but the strategies they apply for page counting is literally counting one at a time, which is slow, and what’s worse I haven’t found any reliable gems that validate pdf, some do but they do it wrong.

If Sumatra doesn’t work for this, do you know any API that I can use?

GitHubRulesOK

I am going to suggest that the developers answer MAY be that since SumatraPDF relies on MuPDF with some tweaking to accept some potentially invalid files then calling SumatraPDF is not a good means to validate.
It would be simpler to use MuPDF Tools direct
https://www.mupdf.com/docs/index.html
Document#isPDF()
Document#countPages()

stianda

Thanks for the recommendation, if Sumatra is not suitable for validating PDF, can it at least be used as a page counter?

kjk

Sumatra is a GUI program not designed for things like that.

We use mupdf internally so you should use that: https://mupdf.com/docs/index.html

mutool is a console app and can be used to get information about pdf.

SumatraPeter

Sumatra seems to be overkill for this purpose. I’d suggest MuPDF as well, or even Ghostscript (both by Artifex Software).

GitHubRulesOK

There are a number of tools that can use Ghostscript to repair bad files such as Coherent cpdf which also has a command line interface with error feedback https://www.coherentpdf.com/usage-examples.html

The -pages operation prints the number of pages in the file.
cpdf -pages Archos.pdf
1388

image

The problem with validation is that many malformed files can pass structural tests

Here SumatraPDF can render a “valid file” but its useless for reading
image