We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Hi Tony,
Is this a web service (i.e. SOAP)? Can you post the XML?
There are a number of factors that may be in play here. If the CR/LF characters form their own text node in the XML, that is considered whitespace and whitespace is commonly stripped during an XSLT transform. It is something that can be fixed, but the nature of the incoming XML dictates the nature of the fix.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Hi, Tom;
Yes, this is a web service. The web page has a multi-line box, in which the user can press <enter> to indicate a new line.
Here's the pertinent part of the XML - the tag in question is "OrderComments" - there are CRLF characters after "comment line 1", after 'comment line 2", after "comment line 3", and after each of the digits 4, 5, 6,, 7, 8 abd 9.
<bis:request xmlns:bis="www.xcentrisity.com/.../request">
- <bis:content>
- <env:Envelope xmlns:xsd="www.w3.org/.../XMLSchema" xmlns:xsi="www.w3.org/.../XMLSchema-instance" xmlns:tns="localhost/.../" xmlns:env="schemas.xmlsoap.org/.../">
- <env:Body>
- <tns:ValidateOrder>
<patientname>KimComments14</patientname>
<orderdate>20141009</orderdate>
<wanteddate />
<PONumber />
<addresses />
<shippostalcode />
<types>SV</types>
<styles>SV</styles>
<materials>P</materials>
<treatments />
<colors />
<rightsphere>1.00</rightsphere>
<rightcylinder />
<rightdecent>2.00</rightdecent>
<rightvertdec />
<rightthickness />
<leftsphere>1.00</leftsphere>
<leftcylinder />
<leftdecent>2.00</leftdecent>
<leftvertdec />
<leftthickness />
<rightprism1 />
<leftprism1 />
<rightcribdiam>50</rightcribdiam>
<leftcribdiam>50</leftcribdiam>
<rightbasecurve />
<rightdiameter />
<leftbasecurve />
<leftdiameter />
<addon />
<lensselection />
<real-order-id />
<orderaddon />
<righttype>SV</righttype>
<lefttype>SV</lefttype>
<rightstyle>SV</rightstyle>
<leftstyle>SV</leftstyle>
<rightmaterial>P</rightmaterial>
<leftmaterial>P</leftmaterial>
<righttreatment />
<lefttreatment />
<rightcolor />
<leftcolor />
- <shipaddress>
<shipaddressItems />
</shipaddress>
<OrderComments>comment line 1 comment line 2 comment line 3 4 5 6 7 8 9 final comment line 10</OrderComments>
<jobtype>u</jobtype>
<leftpresent>1</leftpresent>
<rightpresent>1</rightpresent>
<token>9843ad15-4fb0-11e4-bfd6-00016c705cae</token>
</tns:ValidateOrder>
</env:Body>
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Do you have wrap="hard" on your <textarea> element?
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Tony and I had a direct email exchange on this. Bottom line is that, using MSXML the soap_to_cobol.xsl transform is preserving the CR/LF as expected, so the issue is somewhere beyond that in the import process Tony is probably going to take this to SupportLine.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
XML Extensions strips control characters, which means 0x09, 0x0a and 0x0d since these are the only control characters allowed in an XML text document, from the value before the value is imported into a COBOL data item.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
My regrets; the prior post I made is incorrect and should be ignored. I mistook some tracing code, which does remove TAB, LF and CR from values when tracing them. This is only done for the trace output. I'm still not sure why Tony is not getting these characters in COBOL. Still reviewing ...
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Along this track, Bruce, would setting the attribute xml:space="preserve" override the behavior? This goes back to Tony's original post, where he asks how this behavior can be defeated.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
I have verified with a simple test case that TAB, CR and LF values are not preserved by XML Extensions. I have not yet found where this happens in the process. The xml:space="preserve" attribute has no effect (something I have long had suspicions of being a failure in XML Extensions). Still looking for where this happens. I will probably write an RPI.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
I found two answers to the whitespace issue. Answer one is that XML Extensions intentionally strips characters less than space before copying the XML text node value to the COBOL data item; there is no setting to prevent this. Answer two is that MSXML6 (the Windows parser) doesn't seem to return tab, line feed or carriage return for pChildNode->get_nodeValue() where pChildNode is a TEXT_NODE; MSXML6 doesn't replace them with a space but rather simply deletes them. Libxml2 (the UNIX parser) does return whitespace characters, but then as noted in answer one, XML Extensions removes characters less than space before the transfer to the COBOL data item.
I found that XML Extensions sets the MSXML6 preserveWhiteSpace property to TRUE when loading a stylesheet and to FALSE when loading any other document. Changing this to TRUE for all documents did not cause pChildNode->get_nodeValue() to keep tab, linefeed and carriage return characters, even when the xml:space="preserve" attribute was specified in the XML. (There might be a flaw in my experiment because setting preserveWhiteSpace property to TRUE (FALSE is the default) for stylesheets was specifically added to fix a problem with transform output when a stylesheet was inserting whitespace. This fix worked for the specific case for which it was intended.)
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
I determined that my MSXML6 experiment was indeed flawed. The RM/COBOL DISPLAY on Windows doesn't show tabs, line feeds and carriage returns, but does on UNIX. I further found that the whitespace characters are preserved regardless of either xml:space="preserve" or setting the preserveWhiteSpace property before loading a document. This indicates that there just needs to be a setting for XML Extensions to preserve characters below space, which for XML documents are tab, line feed and carriage return. Note that for Windows, bare line feeds and carriage returns are converted by MSXML6 to carriage return line feed pairs.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
In reviewing what might need to be done to XML Extensions for better whitespace handling, I noted that the XML specification requires that CR/LF and a bare CR be replaced by a single LF in the XML processor. Thus, if your design depends on preserving CR characters, XML is not appropriate except in the case where replacing LF on output with CR/LF is the desired behavior.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Dear Bruce,
I would respectfully request that this behavior be subjected to a new flag (XML SET FLAG value). In general, I agree with your approach. However, the reality of different XML processors is that some flexibility is very desirable here. (Note, for instance, that the BIS request that is stored for Tony's web service request has not had this 0x0D replacement applied.) Section 2.11 of the XML standard is rather arcane and can produce unexpected results to those not familiar with its requirements (and I am still not certain that all XML processors apply it to CDATA sections).
Tom Morrison
Hill Country Software
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Yes, I planned (if this enhancement gets authorized) to have new flags. The XML SET/GET FLAGS statement already uses all but two of the bits available in a 32-bit word, so a new flag word is needed. It could be an XML SET WS-FLAGS (whitespace flags) if we restricted it to whitespace options, but it might be better to be more general, such as FLAGS-X (extended flags). I doubt we need a significant number of new flags that would fill another 32-bit word, so I lean towards the general approach for possible future flags needs. I think we need to enable/disable xml:space="preserve" handling for XML document control of the flags, if desired. In addition, we need to add new flags that allow XML Extensions programmatic control of leading/trailing/embedded whitespace handling (complete removal or normalization to single space). Leading/trailing spaces are currently handled, but not leading/ trailing TAB/LF/CR, which are removed currently. I think spaces need separate handling versus TAB/LF/CR. Defaults for the flags would maintain backward compatibility (for example, the default for xml:space="preserve" would be ignore and the default for TAB/LF/CR would be removal.
My prior post, to which you just responded, was simply to raise the issue that the parsers we use (MSXML6 and libxml2) might limit what can be done with CR if one or both strictly followed the XML specification regarding treatment of CR. During import, we can only get whatever value the parser has obtained for us. There is always the chance that the two parsers do not agree in the area of whitespace handling, especially in the treatment of CR.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Hi, I've run into the same issue that Tony originally described here. Has there been an enhancement made (as Bruce described) to extend XML SET FLAGS or similar, in order to preserve the CR/LF characters?
I'm using extend version 10.0.1.
If no such enhancement has made it into the product, does anyone have any other suggestions for preserving newlines? I've toyed around with xsl:strip-space and xsl:preserve-space in the soap-to-cobol.xslt file, but with no luck so far. I've also considered defining my own XSL template that would detect CRLF's and replace them with <BR> or similar. If that works, I'll have to post-process the string in my COBOL code since I'm not looking to format my output in HTML - it's ultimately getting stored in a SQL Server database as a simple character string.
Any ideas?
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Hi Chuck,
I believe thsis enhancement is part of 10.1.0
XML SET WHITESPACE-FLAGS
This statement has the following parameter:
Parameter Description
Flags A numeric literal or identifier of a numeric data item with a value that represents one or more states of whitespace preservation. Valid flags and their numerical values are:
78 WHITESPACE-DEFAULT-FLAGS Value 0. *> keep all whitespace
78 WHITESPACE-STRIP-CONTROL Value 1. *> strip all (TAB, LF, CR) characters
78 WHITESPACE-PRESERVE-TAB Value 16. *> keep TAB characters when stripping
78 WHITESPACE-PRESERVE-LF Value 32. *> keep LF characters when stripping
78 WHITESPACE-PRESERVE-CR Value 64. *> keep CR characters when stripping
78 WHITESPACE-NORMALIZE Value 65536. *>collapse whitespace (space, LF, TAB, CR)The values for flags 1 to 64 can be added together to achieve combined behaviors.
Description
The XML SET WHITESPACE-FLAGS statement controls the preservation of whitespace during an XML IMPORT statement. The default behavior is to preserve all whitespace; that is, all space characters, line feeds (LF), carriage returns (CR), and tab characters (TAB). Use this statement to determine which elements are stripped out. You can combine the flag values (see Description above) to strip out more than one element; for example, set Flags to 65 preserve the CR elements, but strips out all LF and TAB elements.
You can also normalize whitespace (stripping out consecutive whitespace elements, replacing them with a single space character), but this action cannot be combined with any other action.
The XML TERMINATE statement will revert the whitespace setting back to its default behavior (all whitespace preserved). It does not revert back to its default behavior following an XML INITIALIZE statement.
The valid flags are defined in the COPY file lixmldef.cpy, and so this file must be in effect when using the statement.
Example
The following example uses the values 1 and 16 to have the effect of only striping out the LF and CR elements, but preserving the TAB elements.
XML SET WHITESPACE-FLAGS 17
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Thanks, Steve. Just to clarify, in the description it says "the default behavior is to preserve all whitespace"; prior to this enhancement, it appears that the default behavior was to STRIP all whitespace (or at least TAB/CR/LF characters). If I'm understanding this, the default behavior has changed but setting the flags can revert back to the old behavior (or a variation of it). Is that correct?
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Hi Chuck, The revision that added the whitespace flags restored the old correct behavior of preserving whitespace, such as line feeds, by default on import. This was a change in default behavior that the whitespace flags allow a user to control if they do not like this change and want the whitespace stripped as what was done in the now considered incorrect XML Extensions import behavior.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
ECN 4386 and RPI 1096587 for ACU discuss fixing XML Extensions to not strip whitespace by default. RPI 1096587 was written against ACU Xcentrisity BIS, but it was the use of XML Extensions in that product that caused the stripping of whitespace. The broken behavior was seen in ACU 9.2.0 and not fixed until 10.1.0 (just barely missed being included in 10.0). The issue has been fixed in RM/COBOL XML Extensions (RPI 1098192) and is pending (as of 2017/06/27) a fix in Visual COBOL XML Extensions. The fix preserves whitespace by default and includes whitespace flags settings to eliminate/preserve certain individual whitespace characters (tab, linefeed and carriage return) or to normalize whitespace (reduce all whitespace to a single space); carriage return is a special case that is under the control of the XML parser and thus might not be present in the actual import data from an XML document; parsers are supposed to collapse CR/LF sequences into LF to make XML documents on Windows and UNIX the same. The XML Extensions whitespace flags only affect import. Export has not had an identified stripping problem, but trailing space characters in COBOL data items have always been stripped by default on export; there are data conversion (aka CodeBridge) flags that can be set to not strip spaces (leading or trailing) from COBOL data items.
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Thanks, Bruce. It sounds like this ECN is just what I need, but for the time being I've found a workaround. I managed to craft a custom XSLT stylesheet that I call (prior to the XML IMPORT) using XML TRANSFORM FILE, operating on the BIS Exchange File. It targets the field I'm interested in (using XPath) and replaces the CRLF pairs with a literal '%%' placeholder. Then, after I've imported the data, I use standard COBOL syntax to restore the newlines: INSPECT data-item REPLACING ALL "%%" BY x"0D0A"
We're using XBis 9.2 under Windows; we receive a text string that contains embedded CR/LF characters. In looking at the XML, the CR/LF characters are there; however, when the text string is passed to Cobol, it seems the CR/LF characters have been stripped out of the text.
Is this normal behaviour, or something that can be changed via a setting, etc?
Thanks
Tony
Chuck, if you keep crafting XSLT stylesheets, you'll get a reputation! [;)]
Actually your solution is similar to one that I had to use back when this issue was discovered on the RM version.
Best regards, Tom