Sumatra PDF is a PDF, ePub, MOBI, CHM, XPS, DjVu, CBZ, CBR reader for Windows

Regression or giant memory leak in SPDF 3.2?

Hello,

Up until last week I have used Sumatra PDF 3.1.2 portable, 64bit.
I have not starting using Sumatra PDF 3.2 portable, 64bit and notice massive memory usage to a point that the whole system freezes because excessive swap file usage is needed. Physical memory runs out. Test is done on a system with 8GB Total RAM and approx 5GB RAM free at start of test. Hardly any other app is open.

I can recreate the problem every time and easily with this single page file.

Behaviour in spdf 3.1.2, portable, 64bit
open and zoom to fit page, peak private memory usage (private working set) in windows taskbar is under 100mb. Once rendering is finished it stays under 100mb. When I Zoom and scroll to various spots, 200/300/400% etc the memory usage makes spikes into the 250mb range but drops back to under 100mb when the preview generation and rendering is done. This is perfect considering the content in the file. (see note below)

Behaviour in spdf 3.2, portable, 64bit
I crashed my PC 3 times while writing this post.
open and zoom fit to page hovers around a few hundret mb at first in windows taskbar. Then suddenly the memory usage starts spiking past 3GB. I was able to take a screenshot at 5GB memory usage but a few milliseconds later the whole system frooze because RAM was slaughtered and the virtual memory on HD could not keep up anymore. Only hard reboot helped.
Try zooming in at 200/300/400% in the file and scroll around. All that is possible with a bit patience in version 3.1.2.
On the other hand, get ready to have your PC locked up if you have a total of 8GB Ram. Watch the taskbar statistics and be prepared to force end the process if you don’t want your PC to lock up.

I am pretty sure there is a fundamental change or design problem in the underlying mupdf rendering engine that spdf uses. The question is what can be done about it ?
This makes an otherwise great program unusable for high resolution graphic users.

Note:
The content in the PDF above is an 800dpi compressed image at approx 34x54.5 inch (862x1381mm) size PDF final size.
I realize this is not the average users document size and content, but such files used to work perfectly fine with version 3.1.2

Thank you

For what its worth my Firefox refuses to render more than half way down the page, but did allow me to download a file.


MuPDF-GL (the viewer based on the library that is the core of SumatraPDF) then refuses to open the download as that file apears to require over 3.5 GB which exceeds this devices memory.

Acrobat and 3.1.2 will render it exceptionally slowly as they shuffle memory in small chunks so as to cope.

3.2 as you suggest is to be avoided as it absorbs more memory whilst decompressing but once rendered pre-release settles down at 15MB but zooming will again result in heavy memory usage as it re decompresses, testing with latest pre-release, you should find the memory balance should be much better.

862.2 (27,156px) x 1381.5 (43,511px) =1,181,584,716 px x number of colour bits 32? seems way over the top for what should be a small file for ultra high quality handbill or normaly lower res printing of a full size non glossy poster. It takes ages to load in any viewer, when I try to query it and many editors wont let me adjust the contents as it exceeds their memory abilities. Irfanview simply gives up and crashes, Acrobat refuses to copy the whole page. Any reason its not less colours and stored as 100-300 dpi?

Since I am on windows the most stable result is to copy from SumatraPDF at 100% x 96dpi and paste into Irfanview as a Clipboard Image then save as screen optimised for 65k colours @ 96dpi it thus builds as 3.8MB which loads in 1/70th the resolution thus over 70 times faster and since visually similar on a monitor more than good enough, so as not to matter.

Just a tip for the future - always test with the latest daily build or pre-release version to check whether any issues you’re reporting have already been fixed/improved since the last stable version was released.

Hello,

Background:
this content is average in professional prepress, high-end litho printing or grand format outdoor printing. The content and expectation is not for the average home/office user where previewing at 72/96dpi may be sufficient. Complex trapping https://en.wikipedia.org/wiki/Trap_(printing) , proofing and color seperation processes take place as part of the workflow. Viewing/Proofing with SPDF is just a small part of that workflow.
The offset presses can go to far higher resolutions than my sample file and having the ability to preview a high resolution file is very valuable. I use spdf 3.1.2 for the previewing of such high-res files without any problems.

Comments about your reply:
Firefox is a browser, not designed as a high res pdf viewer and its memory management is never expected to render such content. I am not surprized that it fails to finish rendering the file. I dont expect it to render it either.

every version of muPDF fails to allocate enough memory too, even on a system with 16GB ram because it is designed lightweight and for screen preview - but still tries to allocate one giant bitmap at a resolution that is not needed. I am in contact with the developers to see if the design can be reviewed. Work in progress.

Adobe Acrobat opens the file by reading in horizontal stripes, in 45 seconds on an intel i3, quad core, 3GHZ CPU, with Zoom = Fit Page. Total memory use is around 50MB, almost nothing. The rendering time may be considered “exceptionally slow” for users that are used to normal MS Office documents or regular vector cad files, but for print professionals working with high-res raster and RIPs, 45 seconds rendering time for preview is totally reasonable and acceptable.

SPDF 3.1.2 opens the file in 32 seconds on the same CPU with Zoom = Fit page, under 100mb memory use. This is a brilliant result and outperforms the result for speed of a product of a multi billion dolar corporation.Well done!

Ghostscript, also an industrial strenght tool (and interestingly also from Artiflex, same as mupdf) IS able to rasterize the file to PNG or JPG with very little memory use. 16 seconds for 36dpi and 18 seconds for 96dpi on the same PC. Memory use is under 30mb throughout the process. This is the fastest method to generate a low res preview. But building a smooth viewer with zoom and scroll and pan, zone extractiong etc, like SPDF by stitching these images or rasterizing the needed zone on demand is a pain and a massive project by itself. I really do not want to try reinventing the wheel when SPDF 3.1.2 does such a great job already. Maybe some rendering code in SPDF could incorporate GS (only when needed) instead of muPDF. GS is usually faster for pure raster content, while muPDF is far faster for vector.

The file I supplied is just one example, there are hundrets of other cases where very large presentation posters are printed and need high-res proofing, output is sometimes on 60inch wide and 200+ inch long rolls at 600/800/1200 dpi resolution. Again, this is not the world of casual home/office users. This is industrial strenght. Some printing is done on UV Resistent Latex devices and a 53 feet Truck is wrapped with a canvas like water resistent material. The printed sheet is a single piece per side. There are many other use cases for high resolution raster graphics. High end Printing industry works in CMYK colorspace (32bit color) because that is how the physical printing process works. Preview with RGB is Ok for some cases, when color accuracy is not critical RGB (24bit) files are used as well, but at the end either the RIP or the Firmware or another software converts those 3 RGB planes into 4 planes of CMYK. This explanation is in response to your comment “Any reason its not less colors and stored as 100-300dpi?” Another factor is that many times design studios or third parties create and submit a rasterized file. I have no control over the resolution but would like to preview it without tampering with the original. Here is just one example possible output device. Look at the spec sheet https://www.topazeng.com/hp-specifications/pagewide-xl8000-specs.pdf
1200x1200dpi native resolution at 40 inch wide and a few hundret feet length!

Your suggestion to copy and paste between applications, from Sumatra (I assume 3.1.2 - because 3.2 chokes) over Clipboard to Irfanview then resave with less colors and dpi is a valid workaround if someone is willing to do that manually and one off, and is willing to loose the resolution.
The concerns with loosing resolution are

  • no proper trapping check
  • loss of detail in preview when zoomed in for proofing
  • in some different workflow cases user must zoom into the content at 150-200% , mark an area with the control+mouse feature of SPDF (which is brilliant that it takes vector text if the object is TTF Font, or the Image) and copy the area - OCR text recognitation runs automatically over the extracted image if no vector object is found and extracts metadata out of the content.
    All of these functions suffer or do not work at all if the original is reduced to something like 96dpi
    Also, all the conversions you suggested would need to be done from either command line or some sort of code. The user won’t be able to go home if he has to do this manually.

As you can see I tried to be very detailed about the use case and tried to explain the need for high res preview and zoom ability.

I don’t know the C++ inner workings of SPDF and muPDF enough to comment if my suggestions below are feasiable or not, but SPDF 3.1.2 is able to render such highres files, at any zoom % without any problems, in reasonable time, and I really do not want to loose the upcoming releases of SPDF because there has been a step backwards in memory management which SPDF 3.2 seems to be.

  • there is no requirement to push the entire image into static RAM at a single time if the memory requirement is high. At the end, the image is drawn on screen anyways, into the viewable area of the SPDF process. A Full HD Monitor has a pixelsize of 1920x1080. Even High end Monitors have ‘only’ 3,840x2,160 pixel resolution - those numbers are very little compared to the pixel size of my supplied sample file (27,156x43,511). It is sufficient if SPDF decodes and renders the viewable area, and not waste time on parts that are not going to be drawn on screen anyways, which is still evident even in SPDF 3.1.2 when you zoom in at 200/400%).

  • SPDF could check for “safe” available free memory, if there is anough free memory and it can be allocated for the whole image use it because the rendering will finish faster. If there is not enough memory, either work in stripes (Y direction like Adobe) (and only on the X coordinate ranges that fall within the viewable zone of the Monitor/Display zone) or work in Blocks/Chunks, again only within the viewable zone of the monitor, which is what SPDF 3.1.2 seems to be doing when I observe how it draws and sharpens the blocks one by one when zoomed in at 300% on a high res image.

  • I work with the libvips library as well. https://github.com/libvips/libvips
    it can work with and paste together images from different sources at 100,000x100,000pixels, 32 bit!. With very reasonable memory usage, using block by block processing or what they call “streaming” mode. Maybe the approaches there could give sime ideas to improve SPDF rendering as well.

My goal by writing this is with the hope that the great working memory function in 3.1.2 does not take a hit for the worst because I will be staying at SPDF 3.1.2 and will probably miss out on future development and new functions of SPDF.

Best Regards,

What’s puzzling is that nothing in your lengthy reply indicates that you have tested your files with the latest daily build, in order to let us know whether performance is comparable to that of 3.1.2 again or perhaps the dev needs to look into further optimization (if possible). We don’t have access to such massive files, so unless you care to test and provide feedback I guess you will be missing out on future development and new functions.

Sorry for not including that important test

Result with SumatraPDF-prerel-13136-64.exe ( Build date 2020-09-05) is no better. Memory use spikes beyond 3.5GB at initial open. I had to kill the process to prevent destabilizing the system.
The screenshot is taken with haste shortly before.

Capture

The test file that causes this behaviour in the dropbox link in my first post is still available and will remain available for a while. It is 20mb.

Thanks

Thanks; I left a note for the dev on a related GitHub issue, so let’s wait and see what he has to say about it:

No need to kill the system In 3.3.13112 on a 16GB system (CAD system, not home machine) it loads in around 40 seconds and although it does have to spike on occasion at 5+GB but then stablises around 200MB for repeated zoom in and out, so no leak.

Zooming in on first occasion may take additional 40 seconds to re-decompress and eat up memory as it does so, perhaps best to look at the type of compression you are using, a bigger file may respond much quicker with less memory needed.

May not be a leak as such, but when the same file on the same system behaves far better with 3.1.2, doesn’t that obviously indicate that MuPDF/Sumatra has worsened in terms of performance/resource utilization in such cases?

I agree with @SumatraPeter. 3.2 is a giant step backwards in terms of memory management. It does not seem like a memory leak, but rather a design change or issue.
I tested the same file on a 16GB Ram workstation as well. I observe the same behavior. If no other apps are open and more then 8GB of Ram are always free, the system does not lock up. The file can be opened and scrolled, at the expense of giant memory spikes.
But when the same station is in real use, with about 8 browser tabs, Email client, CAD Design apps and a a few other helper, Database apps that are used when working the remaining physical memory hovers around 3-5 GB. Sumatra killed that station as well by open the file.

In any case, the current memory use makes the app unusable for the scenario I described. I can confirm that it is not a leak, but excessive memory use which did not exist at all in version 3.1.2.

There’s only so much an engine can do to deal with bad PDFs.

This one has 20k x 40k image which requires 4 gigabytes of memory during rendering process, even if it ultimately gets scaled down to a much smaller image.

There is no leak, just giant allocation that will crash if there’s not enough memory.

There might be a way to optimize it but that’s a job for mupdf. You can report the issue to them at https://bugs.ghostscript.com/
If they fix it, we’ll pick up the fix.

Respectfully, the pdf is not ‘bad’. Its a perfect high quality pdf. Higher quality than most people are used to.

Your statement neglects the fact that Sumatra version 3.1.2 did not require the 4GB+ you mention at all while rendering. So it is a Regressipn in my opinion, and an Important one. Memory use while opening the file is under 100mb with SPDF 3.1.2.

I understand the difficulty since you rely on the rendering core of muPDF.
Your readme here https://www.sumatrapdfreader.org/docs/Version-history.html
Says * upgraded core PDF parsing rendering to latest version of mupdf. Faster, less bugs.
But that is not the case in this scenario.

As a workaround, can SPDF incorporate the old rendering core and the new one ?
A command line switch to force using the old rendering core, while keeping the new SPDF functions available ?
Or, if SPDF rendering can check the image size before starting to render and determines that memory use in relation to currently free memory will be higher than a threshold, automatically switch back to the old rendering engine.

What is the version of muPDF that was used in SPDF 3.1.2 ? I will follow up with the Artifex Team as well to see if something can be done on their side.

thanks

There were so many changes over the versions I am “guessing” it would be impractical to try and revive the older code with its own problems.
MuPDF viewers up to version 1.14 report the file has errors and won’t render it then from 1.15 onwards refuses to handle it, so I guess the change was around 2017 i.e. shortly after SumatraPDF version 3.1.2.

I am strugling to find any app that will tolerate its structure well or run faster as those that don’t crash (several peak and freeze or burn out) only 3.1.2 or Acrobat and a few others do so slowly by processing much smaller memory chunks. It is only firefox’s remembering this reply that saved it when I had to fully reboot whilst trying to analyse the file, as any attempt to manipulate or interogate with many apps, so much detail results in a hang/crash.

I guess you are working with 3rd party client files but I’m still failing to understand any advantage to combining process colours at a high resolution would be combined with lousy compression (yes I am using competitive apps for some comparisons)


even at 96dpi those excessive bloating artifacts are visible so will result in memory consumption, poor registration, colour bleed etc.

I’m not against using different compression techniques so ECW wavelets suite me for 500-1000 m long photo strips, POD for billions of pixels in space, JPG (std) for non-reproduction photo/graphic images but for flat colour gradients I would stick to LZW.TIF or more recently PNG.

Do I understand correct that SPDF has no version/source control that is able to restore the state when version 3.1.2 was released ? There shouldn’t have been a need for ‘guessing’ if the state of the core rendering component that SPDF 3.1.2 is using can be restored and investigated. Of course I do not know the project well enough to comment any further.
However, I can tell with confidence that SPDF 3.1.2 is doing something right, even something that muPDF alone from the command line around 2017 could not do properly.

If you want to test another app that handles such files perfectly fine, it is PDF-XChange (Editor or Viewer) from https://www.tracker-software.com/
This app has minimal memory usage as well (around 150mb), you can zoom in and out as much as you want. SPDF 3.1.2 beats this app in memory use, by using even less.

Yes, many files I (and many in my situation) deal with, are not created by me personally. Your comment about why a high-res CMYK file is compressed with such lousy compression is a good observation and justified as well. But it has a simple explanation. To upload a reasonable filesize via dropbox, I recompressed the original lossless PNG/TIF Lzw/packbits compression, with lossy jpeg compression. This file is just one example of many, some with confidential client content in the ranges of 600mb to 2GB. I prepared a minimalistic example at 20mb with heavy compression to show the problem that started happening with SPDF 3.2. The artefacts you have correctly identified are present, but irrelevant for the memory proplem taht is happening.

It seems it is the end of the line for SPDF for high resolution raster graphics. A bit unfortunate because SPDF 3.1.2 was indeed brilliant in its lightweight features.

The problem is also noticed in files with many pages, embedded high res (smaller) images. The post from https://github.com/sumatrapdfreader/sumatrapdf/issues/1491#issuecomment-689892185 is justified and I suspect many other users will notice this over time (or not understand why their system destabilizes suddenly). I was able to reproduce 4GB+ memory usage on 2,000-3,000 page Din A4 and US Letter size files when scrolling quickly with mixed scanned images and vector objects.

This is definately a big regression but I understand and respect your decision if this cannot be fixed.
I am sadly forced to stay at SPDF 3.1.2 because of this.

Just to be clear I am “guessing” because the one and only developer is @kjk who’s hands are primarily tied by the libraries he is given, warts and all.(so currently cut and paste is impacted by the font libs etc.) and thats only one of 5000 multi-line file changes over the intervening 4 years.
Reverting code in SumatraPDF to any older buggy libraries, could result in no progress in some areas, and again I am guessing areas prone to other issues would have suffered by any trimming to block newer misbehaviours.
So there is one guy balancing the books, and two messengers trying to fend off the hail of bullets.
His comment was basically if MuPDF changes the memory aspect for images then SumatraPDF can take advantage of it.

My personal view is that SumatraPDF can render/generate 20,000x32177 pixels.pdf (the biggest I can produce quickly from your sample) NO problem in 3 or 4 seconds without any noticable memory hit, but only IF the


PDF is 64MB compressed i.e. like those I generate with SumatraPDF itself from say 70-80 MB png, not exotically compressed files.

I can go higher but its slower using the full 4GB on my personal laptop so would not push that beyond say 90MB which is more square yards than this house has floorspace to layout any prints.
I insist on clients loaning me a 16GB laptop with their software as a minimum for any heavy commercial contract work, since time is then key and they are paying for it.
image