Forum moved here!

Home / Broken Formatting and Missing Text in some PDF files

fg094

Sumatra

I’ve noticed that certain PDF files appear with odd formatting and missing text. The attached snip is from one such section of a PDF, I am unable to post the snip from adobe along with it but most of the empty spaces are missing text. What’s causing this and is there anyway to fix it?

GitHubRulesOK

Generally have not seen that type of failing, thus is possibly common to a single source of those files.

Looking at file properties (CTRL + D) do they show a common pedigree such as Microsoft Word ?

Would need to access a copy of one to see what may be wrong, but it could be a font should have been imbedded and was not, or if imbedded, its structure is wrong.

fg094

Ok, looking at the file properties it seems they all list the application as “FINEREADER”. it works fine in adobe or a web browser and some of them don’t work now, did work before I had updated sumatra.

GitHubRulesOK

Finereader sounds like it is Abbyy OCR converter and OCR is notorious for such issues.
There are some file types where SumatraPDF version 3.1.2 is still better than 3.2 (or even latest daily) If you add a portable copy of 3.1.2 it is possible to add an ExternalViewer command to quickly open the current file in another viewer (that should be happening if Acrobat is detected) the advantage with calling SumatraPDF is it is easy to open the 3.1.2 copy at the current page

isidroco

I have some PDF bills that can’t be opened on v3.2 and they do on 3.12, I’m returning to v3.12 as it’s unacceptable not being able to view those PDFs.
ie: https://www.mediafire.com/file/t65uzre5xt9e9kx/202012_Aysa_CuponPagoAySA80000308848.pdf/file

GitHubRulesOK

Unfortunately although it works in some viewers the file is not handled by MuPDF which is the engine SumatraPDF uses for this file type
image
I will look closer into it but generally mupdf would need to accept that file first

isidroco

Thanks for your quick reply, I’m puzzled in why v3.1.2 accept it. Did you used MuPDF there? Is there a way to provide my sample to MuPDF developers? Thanks again.

GitHubRulesOK

I cannot find a standalone version of MuPdf that will open that file they either close without message or a message as above.
Yes SumatraPDF 3.1.2 in common with most versions since the earliest ones uses the MuPDF library but for stability some file errors were handled differently, however due to the frequency of thousands of code changes SumatraPDF no longer makes many core changes to the rendering or internal file handling functions but concentrates on the user interface.
This does mean more dependency on problems like this one being closed out by Artifex at https://bugs.ghostscript.com/ however they can take hours or years to reply to open source queries. Often if they decide the file is substandard the reply may be wontfix.

GitHubRulesOK

I struggled to understand what may have caused the problem as I only spent a short time testing, but it looks like there are two files mixed into one page. So I guess it was “modified” badly. The repaired file is 8KB and looks similar in 3.2 compared to the source 16KB file in 3.1.2. Using a visual differencing tool says it cant see any visible changes so you can perhaps pay the amount shown, but I would be suspicious as to what the second page may have been.


using Tracker eXchange I get the following message
image

and the file became only 11KB after it corrected and saved it. Again there was no visual difference.