Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to support as much Unicode as I can in the PDF files I'm writing. I want to be able to output utf8 strings and have them display correctly in the PDF.

I see in the libharu encodings documentation (https://github.com/libharu/libharu/wiki/Encodings) that there are many single-byte code pages I can access, and special functions for accessing multi-byte code pages if I want Chinese, Japanese, and Korean. But my understanding if that if I wanted to use all of those pages and functions to write arbitrary utf8 strings, I'd have to write a bunch of code to break my utf8 strings into segments that each use a specific code page, and then do whatever code page swapping is necessary, with reverse mapping of each of my segments from utf8 to the given code page before outputting it. That seems like a lot of error-prone work compared to just being able to say "write this utf8 string".

To be able to write utf8 strings I'm using this code:

myPdf = HPDF_New( PdfErrorHandler, NULL );
HPDF_UseUTFEncodings( myPdf );
HPDF_SetCurrentEncoder( myPdf, "UTF-8" );
const char *f = HPDF_LoadTTFontFromFile( myPdf, "path/to/verdana.ttf", HPDF_TRUE );
HPDF_Font myFont = HPDF_GetFont( myPdf, f, "UTF-8" );
... go on to use myFont to write various text strings

That works, and I can write utf8 strings with accented Latin characters, and Cyrillic and Greek characters, and they show correctly in the PDF.

However, because I used that HPDF_TRUE to embed the font in my file, it increases the size of my file significantly. I am in fact using four fonts (verdana.ttf, verdanab.ttf, verdanai.ttf, and verdanaz.ttf), and they add over 600k to my file size, as compared to when I was using the "built-in" libharu fonts (which leave the file tiny, just a few k).

(I did try using HPDF_FALSE to not embed the fonts, but then my files open with random Latin characters.)

I'm trying to understand conceptually why it's necessary to embed fonts in my PDF, if I'm using a font like verdana that is going to be on the end user's system anyway. (I don't even care if it's verdana -- any standard sans serif font would do.) I've certainly created lots of PDF files by other means (e.g., exporting from Word) containing Greek, Cyrillic, Chinese, and other characters, and yet they are small. So is this embedding-to-use-utf8 requirement just a quirk of libharu?

Plus, even with that 600k bulk my files made with libharu show Chinese characters as blocks. I read on a libharu documentation page that libharu only supports one and two-byte utf8 sequences, which includes most everything except Chinese, Japanese, and Korean. So does this mean I'm embedding verdana.ttf, the majority of which is Chinese, Japanese, and Korean glyphs, and I can't even access them?

In any case, Chinese, Japanese, and Korean are not important for my current application, but just for the two-byte utf8 sequences I'm trying to understand if there's a way for me to use them in libharu without having to embed big fonts in my file.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
586 views
Welcome To Ask or Share your Answers For Others

1 Answer

For PDF specification, if you do not embed a font, then a conforming reader will try to load the same font from the user's system.

if not found, then it falls back and try to display the character with another font. If the replacement font does not have a corresponding character in the encoding position, then an unpredictable character at this position will appear.

It is always recommended to embed a subset, unless you want to allow users to edit your document, which is a rare use case for PDF docs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...