Copying/Selecting RTL Text


#1

I’m not completely sure if it’s the case with all RTL Languages, but the only Reader that is able to correctly Select the RTL text (Persian/Farsi) is Foxit Reader (on the right):

but after Copying it misses a lot of characters, for example:

  • Half-Space Character (Ctrl+Shift+2):

نوشته‌هاي
is falsely copied as:
نوشتههاي

  • Guillemets & Parentheses:

«» & ()
are copied in the wrong direction as:
»« & )(

  • ِDiacritics:

ـَ , ـُ , ـِ , ـْ , ـّ , ـً , ـٌ , ـٍ ,
are completely misrepresented or copied as broken characters:
 etc.

Related: [Feature Request] Sort pages from right to left in Facing view
@Mahmoud

So far no Reader Supports this I’ve Tried Xodo (UWP), Adobe Reader and etc.
Does it have anything to do with the implementation of PDF format? since there no problem copying RTL from Web or Word Processors.
@kjk


#2

Very much down to the author product as to where the underlying characters are placed and in which order, SumatraPDF has a search R2L switch

Unfortunately I cant tell how accurate this capture is but the search order 1 2 3 seems correct


#3

If you’re talking about the “Right to left Reading order” context menu item from your screenshot, that is a general Windows feature introduced in Win7 IIRC.


#4

by “author product” u mean the word-processor/editor he has used?
and the corresponding editor’s PDF Export configurations?
Hence nothing can be done on the PDF Reader Front?


#5

I am the author of the lower document, I used windows print to generate the upper PDF you can see there is no problem with the selection which I can then paste back to my source. My cycle has a problem in where I accepted the copy since it is partially truncated.That is a difficulty caused by me since I work L2R If I had selected from the R2L start and gone left the result would have been better. However I have to concede I can’t capture first character of each line reliably, unless I "Select the whole page"


#6

looks like Arabic works fine, but I have many Persian Instances,in which words&letters are missed by copying even a 6-7 word sentence, shall I provide the samples/screenshots?

Firefox’s PDF reader seems to handle this slightly better so far…


#7

Best place to raise this with links to sample files is https://github.com/sumatrapdfreader/sumatrapdf/issues


#8

sure but I believe haven’t got to the core of the issue now, and there seems to be a lot of unnoticed issues floating over there.
that’s why I post here first.