[jdom-interest] XHTML issues

Rachel Greenham rachel at linuxgrrls.org
Fri Jul 25 15:21:39 PDT 2003


Jason Hunter wrote:
>> Yes, including the characters directly and outputting with UTF-8 does 
>> work, even just on -b9 (as long as you created your OutputStreamWriter 
>> using the right encoding), no need for latest-CVS. I simply have a 
>> *preference* for defining them as entities, either named or numerical, 
>> and keeping the XHTML source 7-bit clean. I know HTTP is guaranteed 
>> 8-bit safe, and browsers should cope, but I also want it to be readily 
>> viewable in any text editor, specifically nedit in my case, which 
>> doesn't have UTF-8 awareness.
> 
> 
> Then you really shouldn't output with UTF-8.  :-)
> 
> The ability to add characters for special escaping is what we added 
> after beta 9.  In UTF-8 no chars needs to be escaped since it represents 
> all of Unicode 2.0.  Our default escape strategy is only to escape what 
> can't be represented, but you can override that.

Yeah, that sounds good. I've resolved my problem for now anyway, as 
there weren't actually all that many special characters in what I'm 
processing anyway and I've got a mapping to turn them all into named 
entities instead.

But I'm thinking I may end up reverting to letting it write them out in 
UTF-8 after all. Had a thought that non-HTML4-aware browsers (Netscape 
<=4, etc.) may be happier with that.

The other issue I had by the way was that EntityRefs are being printed 
out with surrounding newlines if newlines is true on the XMLOutputter. 
This has the effect that they get surrounded by a visible space when 
displayed in a browser. When you're using entities for quote marks, 
apostrophes, and accented letters (most of the time in fact) this is 
obviously not wanted. However, setting the XMLOutputter to not generate 
newlines makes the source very unpleasant and difficult to look at 
manually (eg: a 30,000 word story crushed to 66 logical lines, most of 
those in a <pre> block, for instance).

My current solution is to extend XMLOutputter with my own 
XHTMLOutputter, override printElement() and when a <p> element is 
encountered, turn off newlines, call the superclass method, and turn 
newlines on again. This is sufficient for the relatively simple 
documents I'm working on at the moment, but might be nice if one could 
control this in a normal XMLOutputter, eg: setting things so that 
entities don't get surrounded by newlines. That, or some kind of 
intelligence so it doesn't happen for elements either *when* they're 
inlined in text.

Maybe it might be worth producing a special standalone XHTMLOutputter 
that formats things nicely for web-development purposes - ie: so the 
XHTML source is workable by hand - but it also renders properly in web 
browsers (presuming they actually compose the XHTML itself correctly of 
course).

-- 
Rachel




More information about the jdom-interest mailing list