[jdom-interest] SAXHandler / CDATA / entities

Malachi de AElfweald malachi at tremerechantry.com
Tue Nov 19 19:50:15 PST 2002


I have never really noticed, cuz I consistently use the CDATA:
	<SomeNode><![CDATA[Here is some embedded HTML with a <br> in 
it.]]></SomeNode>

which, I would think, would be the fastest, since no character-based 
handling is required.
It is also much more human-readable :)

Malachi

On Tue, 19 Nov 2002 23:11:50 +0100, Ingo Struck <ingo at ingostruck.de> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi...
>
>> I am confused by your statement....
>>
>> JDOM does cope with CDATA just fine. You can put all of those characters 
>> in
>> a CDATA now.
> Right... I erred regarding this point - it really works.
>
> What does *not* work properly is the decoding of characters.
> The basic problem here is, that the decoding happens *before* parsing, 
> i.e.
> if I want to spare the CDATA section, I would just say something like:
>
> <SomeNode>Here is some embedded HTML with a &#60;br&#62; in 
> it.</SomeNode>
>
> (The reason for using numeric encoding is, that most chars can be encoded using 
> uniform length; a fact that could be used to significantly speed up the 
> escaping process; if you want all ascii chars with uniform length, then 
> it is even better to use the hexadecimal form)
> If you feed this into jdom, what happens is that the chars are decoded to
> <SomeNode>Here is some embedded HTML with a <br> in it.</SomeNode>
>
> which, of course, is not valid XML. The solution provided here (to 
> exclude the five "named" entities and - what I proposed as a fix - the 
> respective numeric entities) is the wrong approach imho. It would be much 
> cleaner to parse the document and decode the characters *afterwards*. 
> Then you can be 100% sure that the parsed document really contains only 
> the nodes of the serialized form and not some "embedded" stuff that has 
> been decoded/parsed by error.
>
> Kind regards
>
> Ingo Struck
>
> - -- ingo at ingostruck.de
> Use PGP: http://ingostruck.de/ingostruck.gpg with fingerprint
> C700 9951 E759 1594 0807  5BBF 8508 AF92 19AA 3D24
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.0 (GNU/Linux)
>
> iD8DBQE92rcrhQivkhmqPSQRAuH0AJ9i0YvAs1r+n55uwrJdYVrI8Cr1MgCgpsI1
> gMZzGUA+A7umw1zJEWZOs8g=
> =ZAWf
> -----END PGP SIGNATURE-----
>
>
>



-- 




More information about the jdom-interest mailing list