Forum moved here!

Home / Cannot search / find text in some PDFs

Akira

Recently, I’m unable to search within many PDFs, for example, this lecture note. My sumatra is v3.2 64-bit.

Is there any way to solve this problem? Thank you so much!

GitHubRulesOK

Yes
There are generally two types of file causing problems with latest builds

One group is those with scanned images and converted by OCR methods

The other group such as the one linked above are those (usually academic or journal papers) built by TeX

They often worked well in previous version 3.1.2 and both groups are currently subject to open issues at GitHub

GitHubRulesOK

If I wish to use 3.2 or pre-release for latest features but call a section of the existing document for searching in an earlier version 3.1.2 then I can do that in a second copy
like so


You need a portable copy of 3.1.2 in a known location say c:\myportableapps\SumatraPDF then in the advanced options ExternalViewers section add

ExternalViewers [
	[
		CommandLine = "C:\MyPortableApps\SumatraPDF\SumatraPDF.exe" "%1" -page %p
		Name = &Version 3.1.2
		Filter = *.*
	]
]

note that the &V makes the shortcut Alt + F + V

Akira

Thank you so much for your dedicated support! I’m waiting for a official fix for version 3.2 ^^

nhabedi

I have recently installed version 3.2 (64-bit) and I now have problems with finding text (Ctrl-F) in PDF files. Text snippets in PDF files generated by LaTeX which previous versions found are no longer found. This also applies to some PDF ebooks I have. Acrobat Reader has no problems findings the same search terms and, as I said, it worked before I updated to 3.2. Also, this is not consistent. Some documents don’t show these problems, some do. I couldn’t detect a pattern so far. Any ideas?

GitHubRulesOK

The inconsistency is seen in two types of source where conventional paragraph embedding is not the norm. e.g it would generally NOT be a problem when Office or WordPad or similar Word Processor is the source.

Problems are from
OCR eg tesseract or LaTeX (especially with modern fonts and math coding)

In these cases the partial word in a stream are often presenting as spaced single characters and also can be compounded by ligatures.

nhabedi

Thanks for the quick reply. But is my recollection correct that this did work with previous versions of SumatraPDF? I can’t remember seeing these problems before 3.2. What has changed?

GitHubRulesOK

Not sure where the problem is since forcing corruption / blocking of embeded fonts then the files work as expected although fonts look bad

for reference this was covered by now closed issue

nhabedi

I can confirm that searching works if I introduce spaces as reported in:

GitHubRulesOK

@nhabedi
Note this turned out to be a freetype issue and has been fixed about 95% in more recent prerelease versions

GitHubRulesOK

@Akira
Please see the comment above that for TeX users this has generally been resolved and perhaps switch to a more recent prerelease (portable would do if you only need it occasionally for copy) It has extra features such as highlight and note/comments