[jdom-interest] Turning of entity expansion

Paul Chapman chapman at zemsys.com
Wed Sep 4 00:15:24 PDT 2002


OK, so JDom has helpfully converted a character (the &) that could
be confused with an XML reserved character(<, >...) into &amp; for you.
This is normally what you would want, so I doubt it can be turned off.

JDOM does not know that &#169; is already encoded for XML, so it tries
to do it for you.

This comes back to your original comment:

 > >When I look at the output my Unicode reference has been
 > >changed into the actual character, which I do not want, I want
 > >this line to be output verbatim.

So, why is the actual character not acceptible? I am not saying you
are right or wrong to want the original character, I am trying to
ascertain the reason why the translated character is not acceptible
to you. The copyright symbol appears quite happily in my browser
when I use it. Like this: ©

-Paul.

ion wrote:

> Here is an example, consider the following simple program:
> 
> import java.io.*; import org.jdom.*;
> import org.jdom.input.*; import org.jdom.output.*;
> public class test {
>    public static void main(String args[]) {
>       Document doc = new Document(new Element("html"));
>       DocType docType = new DocType("html", "-//W3C//DTD XHTML 1.0
> Transitional//EN",
> 
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
>       doc.setDocType(docType);
>       Element root = doc.getRootElement();
>       Element head = new Element("head");
>       head.addContent(new Element("title").setText("Blah"));
>       root.addContent(head);
>       Element body = new Element("body");
>       body.addContent(new Element("p").setText("&quot; &#169; blah blah"));
>       root.addContent(body);
>       String newItem = args[0];
>       XMLOutputter outputter = new XMLOutputter("  ", true);
>       outputter.setTextNormalize(false);
>       try {
>          outputter.output(doc, new FileWriter((newItem+".html")));
>       } catch(Exception e) { System.err.println(e.getMessage());}
>    }
> }
> 
> (I apologise for the crapness of it, I quickly created it)
> Which is some program, that could perhaps be used to output
> templates for some html page, or more realistically include input
> from some other XML file. Executing this like this:
> 
> java test test
> 
> produces the output file test.html:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html>
>   <head>
>     <title>Blah</title>
>   </head>
>   <body>
>     <p>&amp;quot; &amp;#169; blah blah</p>
>   </body>
> </html>
> 
>>JDOM at all.  They are expanded by the parser before JDOM ever "sees"
>>
> them.  So
> 
>>to keep your original character entities intact, you would have to address
>>
> this
> 
>>(in some way that I can't answer) by tweaking the parser you use.
>>
> 
> Ok, so it was my parser expanding the character entities but...
> 
> Amphersands have been expanded to "&amp;", why is this?
> 
> --SNIP--
> 
>>>That's overstating it a bit, no? He's asking for a particular one of two
>>>forms that are completely equivalent in XML's eyes, right?
>>>
> --SNIP--
> 
> This is a very good point, if they ARE equivalent then there should be the
> option to output either form.
> 
> --SNIP--
> 
>>>misunderstanding of XML. But there are certainly reasonable cases where
>>>something else might care, and you might want to have control over this
>>>(irrespective of this particular case).
>>>
> --SNIP--
> 
> Definately. But it seems as though the only case is the amphersand.  Is this
> right?
> 
> How can I output an amphersand verbatim?
> 
> Regards
> 
> Empty
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
> 
> 


-- 

Paul Chapman

Email:  chapman at zemsys.com
Mobile: +61 418 340 935




More information about the jdom-interest mailing list