Start Learning Korean in the next 30 Seconds with
a Free Lifetime Account

Or sign up using Facebook

PDF file encodings

Been Around a Bit
Posts: 21
Joined: April 18th, 2012 1:47 am

PDF file encodings

Postby JohnThompsonJTSoftware939 » May 25th, 2012 5:42 am

I can't extract or copy/paste text from the PDFs. The result are non-displayable. Why is this?

For example, 002_B2_082007_kclass101_lesson.pdf.

Can you provide the CMaps needed?



Expert on Something
Posts: 870
Joined: February 8th, 2010 8:55 am

Postby trutherous » May 26th, 2012 8:03 am

There just seems to be no end to the pdf problems. Have you tried the "lite" pdfs? Do they give the same problem? ... php?t=2984

Been Around a Bit
Posts: 21
Joined: April 18th, 2012 1:47 am

Postby JohnThompsonJTSoftware939 » May 27th, 2012 5:16 am

Thank you so much for the suggestion. I took a quick look at the "lite" version of beginner lesson 2, and it works fine for copy and pasting, and extracting text. Ironically, the "lite" versions are a lot bigger files (perhaps they have embedded more font information?), but I don't mind, as long as I can get the text out of them.

Having dug into it a little more, apparently you can create .pdfs which use font entry IDs instead of standard character values, such that the pdfs don't even have any usable text in them, unless a special character map (CMAP) is available to convert them to characters. Some of the text extraction tools (Adobe or otherwise) actually use optical character recognition techniques to look at the rendered glyphs to convert it back to text. This is just unbelievable that Adobe would do something so short-sighted. I wish they would just use UTF-8 encodings and be done with it.

Thanks again!


jaehwi Team Member
Posts: 159
Joined: June 17th, 2011 7:36 am

Postby jaehwi » June 4th, 2012 4:45 am

Hi JohnThompsonJTSoftware939,

we are glad that the lite version of the .pdf files work ok for you! Thank you "trutherous" for your help too :wink:
In case you have any other problems, please let us know!


Return to “Technical Support (기술 지원)”