Forum moved here!

Home / Problem with local (non-english) text

umeca74

whereas sumatraPDF will show correctly Greek PDFs and other “exotic” languages, when you select and copy out text it doesn’t agree with the text shown e.g. ΙΔΙΟΚΤΗΤΗΣ will turn to IATOKTHEIA or something.

This affects both the viewer and the text filter (IFilter) meaning that windows search index won’t work properly either for non-English text

is there anything that can be done?
thanks

kjk

Maybe but also maybe not. It depends on the document.

In PDF files, you can embed subset of fonts and when not done correctly, this mapping loses the link between the glyph index (the index of character in the font file) and the actual character.
If other PDF viewers do it better, then it’s a bug in Sumatra. If not, then it’s a bug in PDF file.

umeca74

you are right, it must be a problem with the particular PDF. I created a test PDF myself and Sumatra copied the text correctly!

PavelJ52

Hi, I have the same problem with a pdf book. All viewers display text correctly, but exported txt has wrong characters with diacritics. Is there a way, a tool for repair?

GitHubRulesOK

There are many types of file where this can be a problem that requires an editor to fix the file, however there are also some types of file where SumatraPDF may struggle compared to other viewers.

In addition different versions of SumatraPDF can handle the files contents better than others. and there are two groups of PDF file types where there are currently open issues.

  1. files generated using TeX e.g. PDF LaTeX i.e. often academic papers
    see Cannot search / find text in some PDFs
  2. files generated using OCR e.g. Tesseract
    see SumatraPDF copy / paste issues with Unicode text

The underlying cause is the words are not continuous obects but strings of varying shapes.
Those two groups were better handled by version 3.1.2 so if you are using a version other than that, you can try a portable version of older 3.1.2 to see if it helps.

To give a definitive answer for a specific type of file we would need to see the file which you could post at

Issues · sumatrapdfreader/sumatrapdf · GitHub