Sumatra PDF is a PDF, ePub, MOBI, CHM, XPS, DjVu, CBZ, CBR reader for Windows

How to copy multiple text objects and preserve last zoom/scroll state

Hello Krzysztof,

I use Sumatra PDF with great pleasure for large format engineering/CAD drawings. It is by far faster than Acrobat rendering for some complex content where acrobat takes forever to rasterize and display the content.

Would the following 2 features be possible to implement, or do they exist already and I missed something in the documentation?

1- I can highlight and copy text content to the clipboard. How can I highlight and extract more than 1 text zone? I wish to extract text from non consecutive zones out of the content, ideally by holding the Control or similar key. The control key is currently mapped to marking IMAGE content, which I use as well. That functionality, to extract Images with Ctrl should remain.

2- Can SPDF remember last used zoom and scrollposition of the viewer and store them when it is shut down ? During startup from the command line, or from the settings file I could pass in a flag to activate the last used zoom/scroll values.
Use case: I open an Arch E (36x48") size PDF file with zoom to fit from batch code command line. The titleblock of the drawing is at the bottom right. I zoom in to the bottom right corner and scroll to the position I need to extract information from. I copy the information I need and close SPDF. Automated batch code opens the next PDF which is similar in size, but not exactly the same. I wish to avoid having to zoom and scroll to the same bottom right position again. There are hundrets of files in such loops.
An option that saves last used zoom % and scrollposition, in relative coordinates and restores them automatically when spdf is opened with that zoom option (zoom = last) would be wonderful.
example regarding relative scroll position: If the scrollable width is 1000 units (based on PDF size), and the active scrollposition.start when spdf is closed is 800, the scrollposition.start should be stored as a percentage float value 0.8 in this case.
when the next page opens, and the available scroll width is 1200 units, because the PDF is larger in size, the automatically calculated scroll position should be 0.8x1200 = 960 units. This way the scroll position is independent of the size of the original and will dynamically adjust itself to bottom right, even if the pdf size changes.
The scroll positions should be stored independently for horizontal and vertical axis.

Thank you very much

1 Like

@CanadianHusky

  1. Cut and Paste

Having worked in and with CAD Bureaus and with Design Studios at many levels I do recognise the needs, however whilst there have been many different ways tried over the years, If there was a universal solution, it would be expensive. I am currently using an industrial CAD system where the internal drawing tags are presented in a fixed static position irrespective of paper space variations, thus can be very easily extracted / replaced by database macros. That hasent stopped some young bright spark suggesting we turn two centuries of record produced from left to right, to in the future be standardised in CAD to be drawn arse about face in negative co-ords from the bottom right, just so it suits their program requirements! However PDF will still count units from left to right.

Windows Clipboard evolved and is evolving to multiple entries but still works sequentially (one zone at a time) the only appearance of concurrency is that selecting a zone of mixed content it can either be pasted as image or text into the recipient app. see how pasting in WordPad and paint may differ.

Using Windows Clipboard History to collect several zones in turn and later paste those separate areas at will, is about as good as Windows itself allows.

You can alternately “select all” and in the recipient app discard the unwanted info, but that then depends on the recipient app to allow for multiple page paste so for example MS-paint will only allow the first page.
For an alternative see the sumary below.

  1. Zoom and Scroll

Scrolling to targets can be variable for several reasons and the biggest factor is that page datums are measured up or down from the left whilst title blocks invariably are set to the furthest edge of variable page sizes. If the “Title Panels” are all metric the height range should be well defined however in many industries we may see upto 20xAO which means they are up to 26 yards to the right.

SumatraPDF does remember page last position per document so for 1 page drawings should be easy to say that for E size I wish to zoom 48 across and 36 down, the BUTt is that it will not go where you wish in differing cases unless the filename is static so open each E size drawing as %temp%\E.pdf.

So settings would need to pre-determine default zoom and location for each use case then it is easier to call SumatraPDF with the required parameters. One way to do that is using command line directives

SumatraPDF.exe -view "single page" -zoom 400 -scroll 3456,2592 "allEsize or smaller.pdf"

Summary (TL;DR)
In short if your target is always bottom right simply set the scroll values higher and view as a single page, of course you can also specify -page 1 and on additional lines -page 2 and - page 3 (with an exit on escape) and pause between lines for multipage documents. Such as to allow for manual cut and paste to clipboard. Then on completion of a bunch manually paste each from clipboard history.

Clipboard history may not be accessible in all cases, so for me it is available in one account but not on another !! Ms. Nanny is at play again.
When available it allows 25 items at 4 MB per item. Text, HTML, and Bitmap are supported. I can recomend open source CLCL as portable allows for more items, can save multiple zones etc. But you need to experiment since theres no user manual. (Hint addin the utility tool “save of more items” )

For repetitative CAD tasks, scripting macros is a must. So loading 30 cross references 30 times sugest a programattic approach. For a task such as you describe, to remove the manual aspect via “set and forget” such as to go have a drink and plan around the next task. I would be looking to drive the whole task via known keyboard sequences, such as might be provided by AutoHotKey or similar in VBA.

1 Like

Thank you for the detailed explanations. I included further comments and clarifications:

1- Cut and Paste
I understand the difficulties you mentioned and encountered some of the “bright ideas” you mentined as well. My desire was something far simpler.
I do not need multiple zones seperated in the windows clipboard. It would be fully sufficient if the yellow text marker zone in Sumatra allowed selection of multiple text zones and copies that text zone in one single shot to the clipboard. Multiple zones could be seperated by a linebreak or carriage return, or another special char, for example | vertical line - which is not normally used in text. The receiving app, that monitors the single clipboard entry will parse and take care of the rest. Clipboard history and trying to capture multiple instances as seperate objects out of the clipboard was never my goal, is not reliable as you said and is not needed either. My recommendation was a single copy, only from more than one text zone. Joined with a special character (maybe defined in the settings file)
The receiving app works page by page and opens sumatra, waits for a clipboard object to enter, parses it, clears the clipboard, closes active sumatra and opens the next page with a new instance of sumatra. All that works flawless, only selection of a few words/lines that are in different spots of the content cannot be done conveniently at the moment. If that could be implemented it would be great.
As a workaround I have implemented a bandaid solution. If the “CAPS LOCK” key is on while the Control+C copy in Sumatra is done, the receiving app code takes the clipboard content. makes a slight beep and shows a taskbar popup window with a suitable message as a notification that a portion of text was successfully captured, clears the already captured object from the clipboard and leaves sumatra open - waiting for new content to appear. This loop continues as long as caps lock stays on, or the user closes sumatra without selecting anything. This approach works fine as well to extract multiple, non-consectuvie zones. It may not be the most elegant way, but it works. Highlight and selection of multiple zones in one operation would have been more convenient but if it is not possible or to difficult to implement, I understand.

2- Zoom and Scroll
You comments about the wide variaty of CAD standards are correct. There is no guarantee that titleblock is in any pre determined position - but it does not matter. I do not expect from Sumatra to deliver that intelligence.

Opening the filename with %temp%\e.pdf is a neat trick to remember Zoom/Scroll position, but then I loose the page filename in the sumatra header.

The -zoom and -scoll command line arguments are exactly what I need.
If Sumatra can WRITE a txt file when it shuts down with the last/active zoom and scroll values, my concern would be solved. I would open the file, let the user scroll/zoom, finish his/her work and after shutdown I can read the last used zoom/scroll values which I would pass in as parameters to the next file that is opened per code. All that is missing is the ability to read last used zoom and scroll values. Could that be implemented in a future version ? For example a txt file in the same location as the exe
“sumatrapdf-lastzoomscroll.txt” or write 2 windows registry value strings (zoom=…) (Scroll=…) in the local user hive of the registry if you do not like the idea of text files.

All my tasks and the interaction to Sumatra is automated via .NET App code. So I have easy access to any output that sumatra creates, regardless if text file or registry or clipboard.

Thank you again and I hope my post helped clarify.

1 Like

Multiple captures
For multiple screen scraping there are dedicated 3rd party apps and CLCL is just one sequential approach that attempts to get around what in effect is a windows limitation. Your “bandaid” works, so best not to fix it, but without sight of code I can;t suggest any betterment.

Go to last zoom
As mentioned the values are stored in the setting file (which for portable exe is in the same folder, but installed is in a users appdata dir)
So with
remember these settings per document
pre-set in options
then we can see

FileStates [
	[
		FilePath = C:\Users\folder\Documents\test3.pdf
		Favorites [
		]
		IsPinned = false
		IsMissing = false
		OpenCount = 5
		UseDefaultState = false
**		DisplayMode = single page**
**		ScrollPos = 264 395**
**		PageNo = 1**
**		Zoom = 600**

It would only require forcing the ScrollPos to a higher pair to push the view to the bottom right of the envelope (canvas). Unfortunately, due to the favorites section, the number of lines may vary, however, if the routine/user does not make such an entry then the co-responding line number uplift is normally relative.

It should be possible to start every time with a “sanitised” SumatraPDF-settings.txt settings file containing just that section (and possibly other pre-populated preferences). Then SumatraPDF on starting will usually rebuild other default settings around them as required.
So depending how many entries you seed the line numbers would be around mid 100-200 but again, by careful control, they again should appear static each time the file or group is repopulated

Hint to see how this works
Delete everything but that first section entry and ensure it is terminated with

		ReparseIdx = 0
	]
]

So here the file on the left is a stripped down startup-file with the desired settings in place and the result on the right (when I open the file) is exactly the zoom I want, positioned at top left of my mid page target (observe the scrollbar positions)

On checking the settings file after closure the filename is NOW fixed at line 75 (but can vary if you include other pre-populated settings, so I can now save that later section for parsing or use the whole file to seed the next run by changing the filename OR clip it and concatenated use as / in new seed for settings.txt.
Note add the preferences to the top of settings file and SumatraPDF will trickle them down to correct location, but that does mean their line number is less easy to determine if user changes are also allowed.

ONE LINE KISS ANSWER
Simply keep only one variable filename for ease of use, then overwrite that single entry (the user can alter zoom if the need arises, but will be applied to next file)
FART (Find And Replace Text) or equivalent

Fart.exe SumatraPdf-settings.txt previousfilename.pdf nextfilename.pdf & Sumatrapdf.exe nextfilename.pdf
1 Like

These are excellent tips. I will implement these. thank you

1 Like

Hello,

I still experience a problem.
I open a test page with sumatra. I close it. The settings file gets updated as follows
note the scroll position Y coordinate

FileStates [
	[
		FilePath = c:\temp\Testpage_cad.pdf
		Favorites [
		]
		IsPinned = false
		IsMissing = false
		OpenCount = 6
		UseDefaultState = false
		DisplayMode = continuous
		ScrollPos = 2318 1522
		PageNo = 1
		Zoom = 200
		Rotation = 0
		WindowState = 1
		WindowPos = 1687 183 1024 640
		ShowToc = true
		SidebarDx = 252
		DisplayR2L = false
		ReparseIdx = 0
		TocState = 1
	]

I open the same file again.
Scrollbar X is at the correct position.
Scrollbar Y is reset back to -1
I overwrite the txt file manually to a high value and open the file again by command line but the viewer does not honor it. Scrollbar Y is always at the top, no matter what I try

I scroll in the viewer, close the file. The txt file is updated correctly.
I open the same file again, it is scrolled to the top ?

Is this a bug or am I missing somthing

Would take me some time to find suitable samples to test for all cases, however I suspect it may be because you are in continuous mode (i.e. from start of file) rather than single pages (more x,y relative).

I would need to observe your workflow as to why you use certain settings, and am working elsewhere so suggest you look at the need to work in continuous mode rather than page by page and check out both cases.

I did further testing.
There is no requirement for me to work in continues page mode.
I switched the settings to DefaultDisplayMode = single page as per your recommendation.

With that setting the Y coordinate works fine in Windows 10 but does NOT get honored in Windows 7, 64 bit. Only the X coordinate works in Windows 7 for me.

Strange.

Further testing:
On Windows 10 even continues mode, multipage and other bizzare view/zoom/scroll settings work fine form the txt file and at shutdown. The problem is isolated with Windows 7.
Not the end of the world…As long as I know why, I can live with it and adjust the environment.
Thank you for all the support

Unsure why 7x64 should differ however could try portable or 32bit on x64

This was my mistake…there is no problem on Windows 7 either. I tested so many scenarios and it was hard to isolate the real source of the problem why the Y ScrollPos coordinate would not get honored.

It is the command line argument. Adding -page 1 or any page number to the command line at startup causes the Y ScrollPos to get lost. When a file is opened and the page number is given in the settings txt file everything is fine, even on multipage files and page number greater than 1 and continuous or book view mode work fine as long as the settings are passed in ONLY from the txt file and not form the command line argument.

Thank you for all the help. Your software has excellent customization capabilities.

1 Like

Whilst looking at current ways this might be done now using current methods with SumatraPDF, there were suggestions that powershell 5 has an append option, however there were also suggestions that ability may not be posible in the newer powershell 6 versions. So I am confining a Proof of Concept to the historic method of sequential capures.as described above. Note that the routine works without pre-release but the highlights can only be added in pre-release version by pressing A as part of the capture.


We can easily add a shortcut to invoke a MultiClip.cmd. I have not tried to make it hidden so the black console box will appear, beep and ask if the first area is currently in the clipboard. If you press Y it saves it (allowing you to change your selection first) You can then select a second area with CTRL+C and the cmd will be looping while you are doing so. Then again it beeps to say its ready. If you again press y the second capture will be appended to the first and the loop asks a third time etc.There may be a limit to eventual file size or number of captures, but I am not going to proove how this method could be restricted.
Once you have as many captures as desired press N and the black window will disapear, and in return Notepad will show the clipboard.text which you can save as any name you like.

here is a basic MultiClip.cmd, however due to limits in pasting on the forum, the @echo ^G beeps are not visible and can / do get destroyed when copy and pasting. If they dont work, you can replace them your self using notepad or replace with a “beep” command (included in the zip) so download this file from https://github.com/GitHubRulesOK/MyNotes/raw/master/AppNotes/MultiClip.zip

@echo off
goto :start
It is possible to use clipboard append but as its likely to be removed soon
lets keep this one line :-) PoC simple enough to adapt for that event.

Lets seed the outputfile with the filename + startpage and pause for a
 second to ensure clipboard.txt is populated. Unfortunately using command
 find "F:\path\SumatraPDF-settings.txt" "PageNo =" returns too many hits.
 so would need to construct a command to find the filename then search a
 few lines later, too much effort for this simple PoC (Proof of Concept).
 Also note there is an oddity in SumatraPDF's call that if only the page
 number is requested it will always append the filename as well so reversed.

:start
echo ** %2 ** StartPage=%1 **>%temp%\clipboard.txt
timeout /t 1 > nul
call :Save clipboard done
start "" Notepad.exe %temp%\clipboard.txt
exit /b 1

:Save clipboard multiple times to .txt and confirm done
@echo ^G
:lets pause for a second to seperate the ready beeps
timeout /t 1 > nul
@echo ^G
CHOICE /N /M "Is Ctrl+C Ready?"
if errorlevel=2 echo ** %~2 **>>%temp%\%~1.txt & goto :eof
if errorlevel=1 goto yes
if errorlevel=0 goto :eof
:yes
echo Yes copying
echo START ** >>%temp%\%~1.txt
Powershell -Command "& {Get-Clipboard -Format text -TextFormatType UnicodeText -Raw}" >>%temp%\%~1.txt
echo  END **>>%temp%\%~1.txt
:lets pause for a second to let clipboard be saved
timeout /t 1 > nul
goto :Save

It should work with any windows application where you can CTRL + C but to add it as a plugin with a hotkey in SumatraPDF open advanced settings and add it to ExternalViewers section like this.

ExternalViewers [
	[
		CommandLine = "c:\path to file\MultiClip.cmd" %p
		Name = Clipboard &Appender
		Filter = *.*
	]
]

the shortcut will then be ALT + F + A (before or after the first CTRL + C)

You can easily modify or remove the additional ** markers, I included in the output by replacing or deleting those lines. Notepad is usualy good at handling the control codes if you dont change format from UTF-8.

I have added a second version (same link as above) that is more specific to SumatraPDF.

So just make your selections (text or area with text) and it will AutoCopy those for you. As a proof of concept you can add your own variants just as you wish e.g. to not show results. However, without a re-write in all .vbs, the console window can’t be avoided.

Usage:

  • use your shortcut to start select text/area 1
  • reply Y when ready (you will need to click on the black window first)
  • prepare second selection
  • reply Y when ready etc.
  • just reply N when your done.

Any mistiming (double or missing selections) can usually be resolved by adjusting the ping or timeout values.
License: Its a Proof of concept, Your on your own if you edit the files.

@CanadianHusky

I have not found any app yet that aggregates zones for clipping, that is not to say some could, however a highly customisable ClipBoard Manager is Ditto which says “No way to save copies to a single clip. But you can select multiple clips, hit enter and all the clips will be pasted as one clip.”
Note this one was a single entry clipped from sourceforge with just one ding.
https://sourceforge.net/p/ditto-cp/wiki/Getting%20Started//

Thank you for the follow up.

I have implemented my requirement of extracting/aggregating and post-processing multiple Text/Image zones with a .Net App that runs parallel to Sumatra. Your suggested Multiclip proof of concept works, but is a bit sensitive and does not allow for more intelligent processing - for example OCR the captured image (if it was an image) and do Regex validations. Also the solution needs to work in a reliable way for non-tech savvy users. A foolproof and polished UI was needed.

The App opens desired pages in SPDF and starts monitoring the clipboard.
The user selects a Text or Image zone from SPDF and clicks Ctrl+C. The Control+Mouse selection in SPDF 3.1.2 is really awesome and how it can be used for both raster and vector based selection.
The App captures that selection and appends it its UI, setting focus on the .Net App UI.
The users has the ability to make corrections to the extracted information and do multiline edits.
If the user has the CAPS LOCK key ON, the App puts SPDF back into the foreground and waits for the next cycle of data to be extracted.
If the UI is closed from the Window corner, or if the CAPS LOCK key is turned OFF, the app treats this as a signal “I am done selecting multiple zones”
SPDF is really great for the level of customizations that can be done.