Problem:
An XML instance document exists on disk. In COBOL, it is declared as "organization is xml". When a customer tries to open it for input and read it record by record, the READ NEXT statement does not behave as expected.
Resolution:
The READ statement for XML instance documents can be used two ways:
1) A single READ without the NEXT clause reads the entire XML instance document (i.e. the whole input file) into memory, all at once.
2) A READ with the NEXT clause assumes that you have already read in the whole document using a single READ without NEXT as above, and have already established a key-of-reference using a START statement. READ NEXT examines the in-memory image of the XML document, and returns the next record in the key-of-reference. READ NEXT does not actually perform any disk I/O.
Consider the demo program "customerdbxml.cbl". This is the last program in the XML tutorials described in the documentation [Note: if you receive an error compiling this program, refer to Knowledgebase article 23418]. At first this program uses a READ with no NEXT to read in the entire input document:
124 if xml-status = "0"
125 read customer-file
126 move "1" to xml-status
127 end-if
After this single READ is performed, the whole input file is in memory. The rest of the program merely examines or modifies this in-memory image. The program uses a START statement before a series of READ NEXT statements. For example,
332 file-read-customer-record section.
333 start customer-file key is customer-record
334 perform until exit
335 read customer-file next key is customer-record
Statements such as DELETE, REWRITE, and WRITE KEY modify only the in-memory image of the XML document without performing any disk I/O. When a program is done, the final XML document can be written out using a WRITE without the KEY clause. This final WRITE without KEY dumps the entire modified in-memory image of the XML document to disk.
As an alternative to this way of processing an XML file, a person could use the XML PARSE syntax created by IBM. Server Express fully supports this syntax. It is an event-driven mechanism capable of reading an XML instance document sequentially and providing fragments of the document, along with event-codes, to a user-defined routine. Processing a document this way avoids the necessity of loading it all into memory at once.