From klemens.waldhoer at heartsome.de Thu Dec 5 01:41:13 2013 From: klemens.waldhoer at heartsome.de (=?iso-8859-1?Q?Dr._Klemens_Waldh=F6r?=) Date: Thu, 5 Dec 2013 10:41:13 +0100 Subject: [jdom-interest] Question about CR/LF Handling Message-ID: <003201cef19e$23754da0$6a5fe8e0$@heartsome.de> I am using JDOM in the open source project openTMS mainly to parse Xliff files . One problem I run into now is that I find no way how to keep carriage return (CR) / line feed (LF) exactly in the same way as they are in the original xml file once I write a copy of the possible modified file to the XMLOutputter. What I would like to have is: "Text CR text LF text CR LF text" Is exactly copied as it is to the output. And not changed to "Text LF text LF text LF text" depending on some setting. I know what the XML spec says about this, anyway I need the original characters. Anything I can do about it? best --------------------------------------- Klemens From mike at saxonica.com Thu Dec 5 02:07:56 2013 From: mike at saxonica.com (Michael Kay) Date: Thu, 5 Dec 2013 10:07:56 +0000 Subject: [jdom-interest] Question about CR/LF Handling In-Reply-To: <003201cef19e$23754da0$6a5fe8e0$@heartsome.de> References: <003201cef19e$23754da0$6a5fe8e0$@heartsome.de> Message-ID: <28AAF022-B100-4BB4-ADDF-F4732A327A7B@saxonica.com> On 5 Dec 2013, at 09:41, Dr. Klemens Waldh?r wrote: > I am using JDOM in the open source project openTMS mainly to parse Xliff > files . One problem I run into now is that I find no way how to keep > carriage return (CR) / line feed (LF) exactly in the same way as they are in > the original xml file once I write a copy of the possible modified file to > the XMLOutputter. > > What I would like to have is: > > "Text CR text LF text CR LF text" > > Is exactly copied as it is to the output. And not changed to "Text LF text > LF text LF text" depending on some setting. I know what the XML spec says > about this, anyway I need the original characters. > > Anything I can do about it? > Not really. If XML says that a distinction is irrelevant (for example the whitespace after a tag name in a start or end tag, or the choice of single or double quotes) then the XML parser is going to normalize things so the application doesn't know what was in the original. That's by design; you're not supposed to write applications that are sensitive to such distinctions. Michael Kay Saxonica From paul at hoplahup.net Thu Dec 5 07:30:25 2013 From: paul at hoplahup.net (Paul Libbrecht) Date: Thu, 5 Dec 2013 16:30:25 +0100 Subject: [jdom-interest] Question about CR/LF Handling In-Reply-To: <003201cef19e$23754da0$6a5fe8e0$@heartsome.de> References: <003201cef19e$23754da0$6a5fe8e0$@heartsome.de> Message-ID: I think you have two strategies: - sniff it inside, to inquire if it's one of sorts of line endings. This is pretty easy. - work in close collaboration with the sax locator so that you can map back things when you are handed a piece of text (hard, but would be complete). paul On 5 d?c. 2013, at 10:41, Dr. Klemens Waldh?r wrote: > I am using JDOM in the open source project openTMS mainly to parse Xliff > files . One problem I run into now is that I find no way how to keep > carriage return (CR) / line feed (LF) exactly in the same way as they are in > the original xml file once I write a copy of the possible modified file to > the XMLOutputter. > > What I would like to have is: > > "Text CR text LF text CR LF text" > > Is exactly copied as it is to the output. And not changed to "Text LF text > LF text LF text" depending on some setting. I know what the XML spec says > about this, anyway I need the original characters. > > Anything I can do about it?