[jdom-interest] newbie question: parsing in xhtml containing mathml

Morten Andersen mortena at mip.sdu.dk
Tue Jul 6 04:08:05 PDT 2004


Well, the task is pretty simple, but I can't get anything working.

I want to parse in an xhtml document containing mathml with all the 
entitities defined like alpha and beta. This should then be transformed 
using xslt into another xml-document.
The test-xhtml document is shown below:
----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
  "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" [<!ENTITY mathml 
"http://www.w3.org/1998/Math/MathML">]>
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
    <math xmlns="http://www.w3.org/1998/Math/MathML">
        <mrow>
            <mi>&#950;</mi>
        </mrow>
    </math>
</body>
         </html>
----

Here is what I've tryed:

Parsing the document in using a SAXBuilder with the default settings:
-----
         SAXBuilder builder = new SAXBuilder();
         FileInputStream stream = null;
         if (file.exists()) {
             try {
                 stream = new FileInputStream(file);
                 InputStreamReader reader = new InputStreamReader(stream);
                 builder.setValidation(false);
                 try {
                    doc = builder.build(reader);
                 } catch (Exception e) {
                    e.printStackTrace();
                 }
            }
     }
---
This results in this error:
   "org.jdom.IllegalTargetException: The target "IS10744:arch" is not legal 
for JDOM/XML Processing Instructions: Processing instruction targets cannot 
contain colons."

Then I tryed to trick the SAXBuilder so that the DTD's are not used by 
setting the entityResolver to an entityResolver, that doesn't do anything.
  ---
         SAXBuilder builder = new SAXBuilder();
         builder.setEntityResolver(new NoOpEntityResolver());
---
This results in some output to System.err:
         "[Fatal Error] :1:66: White spaces are required between publicId 
and systemId."
But the transformation seems to occur.

I tryed writing the parsed document to a file. This file doesn't contain 
the entity: &#950;
---
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" 
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body bgcolor="white">
     Hello world
    <math xmlns="http://www.w3.org/1998/Math/MathML">
        <mrow>
            <mi>?</mi>
        </mrow>
    </math>
</body>
         </html>
---
That could be due to an encoding mistake somewhere.

So as you can tell I've been struggling with this issue for quite some time 
getting nowhere. Is it really that difficult parsing in an xhtml document 
and transforming it using xslt?

How can I transform an xhtml document containing mathml into another xml 
document using xslt?

Regards


Morten Andersen
Master of applied mathematics and computer science
Associate professor

The Maersk Institute of Production technology at Southern Danish University 
www.mip.sdu.dk
Campusvej 55
DK-5230 Odense M
Denmark
+45 65 50 36 54
+45 61 71 11 03
Jabber id: hat at jabber.dk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20040706/85ead6fd/attachment.htm


More information about the jdom-interest mailing list