[jdom-interest] Outputting escaped entities in element text

Elliotte Rusty Harold elharo at metalab.unc.edu
Thu Oct 3 06:24:10 PDT 2002


At 12:45 PM -0700 10/2/02, David Moles wrote:
>I'm using JDOM to generate an XSL:FO tree for conversion to PDF,
>and I'd like to include some exotic characters (specifically
>em dash, \u2014, and multiplication sign, \u00d7) in the text that
>I'm outputting. The Apache FOP processor requires that these be
>given as escaped entities (—, ×); it then maps them
>to the appropriate characters in the font that it's using.

Your message indicates that you have fundamental misunderstanding 
about how XML (and FOP) works. If those are corrected, the solution 
should become apparent.

1. — etc. are not escaped entities. They are character references.

2. No conformant processor (including FOP) cares whether or not you 
use the character references or the actual characters, provided that 
they are representable in the chosen character encoding.

3. If they are not representable in the chosen character encoding 
(e.g. Latin-1) then you need to use a character reference instead.

4. Java strings are always in UTF-8, which can represent such 
characters. Again, though, there's a non-XML escaping mechanism using 
\u in the event you're not writing your Java code in UTF-8.

5. The XMLOotputter should be able to figure out which characters it 
can and cannot escape. YOu do not need to concern yourself with this.

What you need to do is this:

  Element foo = new Element("inline", namespace);
  foo.setText("\u2014;");

>then in my XML I get
>
>   <fo:inline>[Unicode character 2014]</fo:inline>


That's what you should get.

>-- the character's not escaped, it's just encoded in UTF-8, as
>you'd expect it to be. But then FOP doesn't know it needs to be
>remapped, it looks for the character in the font it's using and
>doesn't find it, and so in my PDF I get the "#" for "garbage
>character".

If this is true, and I still doubt it, then FOP is broken and needs 
to be fixed. This is not a JDOM problem.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list