[jdom-interest] Resolving Entities...when no DTD is assigned (not DOCTYPE declaration) in XML

Vish D. vishpool at gmail.com
Wed Aug 31 15:10:42 PDT 2005


Hello all,

I am having some trouble figuring out how to go about resolving entities 
when an XML file doesn't have DOCTYPE declaration (no DTD attached to it), 
but contains entities that are 'non-standarad' (such as, ' ', etc...). 
I need to do this in such a way that I don't change the XML file (without 
added DOCTYPE declaration, etc..).

My need for the above is as follows:

SAXBuilder builder = new SAXBuilder();
....
fulltextXML = builder.build(new FileInputStream(filename));

-- fails with an exception ---

C:\HTMLs\00063185_200_1_67\00063185_200_1_67_Document.xml is not 
well-formed.
org.jdom.input.JDOMParseException: Error on line 5: The entity "nbsp" was 
referenced, but not declared.
Error on line 5: The entity "nbsp" was referenced, but not declared.


Is there a way to resolve such entities, without having to declare the 
DOCTYPE in the XML file? 



Thanks in advance!

Vish


Sample XML file:

XML FILE 
--------------

<?xml version="1.0" encoding="UTF-8"?>
<object_document>
<art_title> Muscular Alteration of Gill Geometry in vitro: Implications for 
Bivalve Pumping Processes -- Medler and Silverman 200 (1): 77 -- The 
Biological Bulletin</art_title>
<converted_from type='HTML'>BiolBull V 200 I 1 P 77 Fulltext 00063185.htm
</converted_from>
<fulltext>&nbsp;Biol. Bull. 200: 77-86. (February 2001)&#169; 2001 Marine 
Biological LaboratoryMuscular Alteration of Gill Geometry in vitro: 
Implications for Bivalve Pumping ProcessesScott Medler* and Harold 
SilvermanLouisiana State University, Baton Rouge, Louisiana 70803* Author to 
whom correspondence should be addressed. Current address: Department of 
Biology, Colorado State University, Ft. Collins, CO 80523. E-mail: 
Skmedler{at}aol.com<!-- var u = "Skmedler", d = "aol.com <http://aol.com>"; 
document.getElementById("em0").innerHTML = "" + u + "@" + d + ""//-->
&nbsp;Received 23 March 2000; accepted 19 October 2000.
</fulltext>
<jrnl_title>BiolBull</jrnl_title>
<issn>00063185</issn>
<volume>200</volume>
<issue>1</issue>
<fpage>77</fpage>
</object_document>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20050831/e3e3625c/attachment.htm


More information about the jdom-interest mailing list