[jdom-interest] Simple xhtml/entity resolver?
olivier.jaquemet at jalios.com
Thu Mar 29 08:47:33 PDT 2012
JDom is a great tool for parsing XML...
... but for XHTML fragment (which may not be completely XHTML compliant
and specially for text extraction, I would strongly suggest JSoup
String text = org.jsoup.Jsoup.parse(html).text();
Whatever is your html it will work like a charm (even it is an ugly copy
paste wysiwyg from word or any ugly html export from whatever website)
On 29/03/2012 15:23, Oliver Ruebenacker wrote:
> I need a simple way to convert some XHTML fragments, provided as a
> JDOM Element, into plain text. I am willing to ignore most HTML tags
> and consider only the most commonly used predefined entities.
> In JDOM, an entity reference has a name, a public id and a system
> id. I think I know what the named means, for named entities. But what
> about numeric entities, how do I get the code point? And what are
> public id and system id?
> Take care
Olivier Jaquemet<olivier.jaquemet at jalios.com>
Ingénieur R&D Jalios S.A. - http://www.jalios.com/
More information about the jdom-interest