Skip to main content

DOM-based and event-based XML parsers

  • February 15, 2013
  • 0 replies
  • 0 views

Problem:

I plan to use the XML PARSE statement to read an XML instance document, but can I then use the READ statement to examine the data one record at a time?

Resolution:

There are two different XML parsers available in MF COBOL. One is a DOM-based parser and the other is an event-based parser. The DOM-based parser is invoked by a READ statement, whereas the event-based parser is invoked via the XML PARSE statement.

DOM parsers read an entire XML instance document into memory all at once and hold it in the form of a tree. Once the entire document is in memory, a COBOL program can read through the data, and update and modify the data. In the end, the updated memory image can be written out to disk as an XML instance document.

An event-based parser, on the other hand, reads an XML instance one line at a time, and presents it to a program as a series of events.  For example an event might be 'START-OF-ELEMENT' announcing that a new XML element has been encountered, or it might be 'CONTENT-CHARACTERS' meaning that actual data has been encountered.

This parser does not build a tree or keep any memory of the XML it has read, it merely presents it one event at a time to the program. It is up to the program to act on each event and save the data or process the data as it sees fit. A program could, for example, build its own memory table by incrementing a counter for each 'START-OF-ELEMENT', and using the counter as an index into a table where elements will be stored.

Those are the two available types of XML parser. Below are more details of each:

To use the DOM-based parser in COBOL, use the READ statement (without the NEXT clause) specifying a disk file that holds an XML instance (or specifying an address in memory that holds an XML instance). Just one READ statement causes the COBOL system to read the entire XML instance into memory all at once.

Large XML instances will cause the process memory size to grow, possibly exceeding the OS system limit for process size. In this case a person could increase the limit in the OS configuration.  Large instances cause performance to decrease.  Memory and performance limitations mean there is a practical limit on the size of an XML instance that the DOM parser can handle.

After the data is in memory, it can be processed by COBOL statements in a way reminiscent of processing indexed files, that is, a START statement can be used to establish a key of reference, then READ NEXT statements can read along the key. DELETE, REWRITE, and WRITE KEY statements can be used to modify the data (for more information, see Knowledgebase article number 23419). None of these statements actually perform any physical I/O, instead they operate on the memory image. In the end there is an option to write the memory image out to disk as an XML instance, by using a WRITE statement without the KEY clause.

On the other hand, to use the event-based parser, use the XML PARSE statement.  The XML instance should already be in memory, in a data item in the working-storage or linkage section.  The PARSE statement requires you to write an event-handler. Your event-handler will be responsible for dealing with the data one event at a time. See Knowledgebase article 19621 for an example.

When using the XML PARSE technique, a person cannot use the XML-enhanced versions of the READ, START, DELETE, REWRITE, and WRITE statements, because these depend on a DOM-based memory image of the XML instance. Instead, the event-handler is responsible for all processing and modification to be performed on the data.

Old KB# 2022

0 replies

Be the first to reply!