Skip to main content

Problem:

Sample attached is just a demonstration which applies recommendations of KB 19691

Customer has questions on using large XML files with Net Express

http://supportline.microfocus.com/mf_kb_display.asp?kbnumber=19691

What does this sample show?

===========================

It just shows the chapter 3) of the KB 19691

...

3. You can use the CBL_OPEN_FILE and CBL_READ_FILE byte stream routines to read the data directly from the XML document in managable blocks of 100KB or more. You can then use INSPECT and STRING statements to parse the data read to find the first <childdocn> start tag and the last </childdocn> end tag in the data buffer and then extract this information into another buffer such as XML-BUFFER PIC X(100000).

In the SELECT statement for your XML enabled file you can have ASSIGN TO ADDRESS OF XML-BUFFER. Then when you open the file you can do a READ NEXT through each of the <childdocn> elements in the XML-BUFFER until you reach end of file. When you are done processing this buffer of data you would close the XML enabled file and reposition the starting offset for the next CBL_READ_FILE to point to the byte immediately following the last </childdocn> end tag that you extracted into XML-BUFFER. You can then perform the CBL_READ_FILE operation and repeat this process until the end of the XML document has been reached.

This method should be used when you do not have any control of the format of the XML input document.

...

===========================

Customer's context is reading an XML file of 400 Megabytes.

Demonstration described below doesn't use such a big file, but the methodology will be the same with a 'HUGE' file...

===========================

this demonstration was tested on Net Express 5.0 WebSync2 and on Server Express 5.0.

Resolution:

1) STEP 1 of the sample: in paragraph GenerateXmlFile of ioxml.cbl

========================================================

The STEP1 is just an easy way to quickly generate an XML file ... named FItest.xml

An XML stream is generated, using the XML GENERATE statement .....

(The XML GENERATE statement converts data ( GROUP of datas )  to XML format.)

The XML stream is generated from the COBOL Group ROOTNODEANSI

and written to a classical Line Sequential file. ... so this file can be reused later in an XML context

A quick look at the Group ROOTNODEANSI shows it contains more 'sub-groups':

      78 Lg value 50.

       01 ROOTNODEANSI.

           02 B-XMLNOTUSEFULL.

              03 B-XMLNOTUSEFULL03-1  PIC X(LG).

              03 B-XMLNOTUSEFULL03-2  PIC X(LG).

              03 B-XMLNOTUSEFULL03.

                 04 B-XMLNOTUSEFULL04-1  PIC X(LG).

                 04 B-XMLNOTUSEFULL04-2  PIC X(LG).

          02 COORDONNEESANSI.

           03 NOM               PIC X(LG).

           03 PRENOM            PIC X(LG).

           03 ADDRESSE.

                04 ADR1         PIC X(LG).

                04 ADR2         PIC X(LG).

           03 CODEPOSTAL        PIC X(LG).

           03 VILLE             PIC X(LG).

           03 PAYS              PIC X(LG).

           02 E-XMLNOTUSEFULL.

              03 E-XMLNOTUSEFULL03-1  PIC X(LG).

              03 E-XMLNOTUSEFULL03-2  PIC X(LG).

              03 E-XMLNOTUSEFULL03.

                 04 E-XMLNOTUSEFULL04-1  PIC X(LG).

                 04 E-XMLNOTUSEFULL04-2  PIC X(LG).

OF COURSE, we'll later ONLY read the sub-group COORDONNEESANSI. of ROOTNODEANSI.

2) STEP2 of the sample is just an XML OPEN & READ of the Line Sequential file generated in STEP1.

Using the XML file named XMLFILE (select XMLfile      ASSIGN            FItest)

The XD used for this file was generated using cbl2xml tool.

The whole XML file is read here...

========================================================

3) STEP3: we'll now READ ONLY the XML node COORDONNEESANSI and its sub-nodes.

========================================================

31)

Dynamic memory allocation using the MF primitive CBL_ALLOC_MEM.

The number of bytes allocated in the demonstration is 240 bytes

(78 78memsize    value 240.  *> unit = bytes)

The only aim of this small value is to loop later when reading the file with the byte-stream routines

So we don't have to allocate a HUGE amount of memory

32)

byte stream routines CBL_OPEN_FILE & CBL_READ_FILE to open file generated in STEP1 and get its total size

33)

LOOP

..byte stream routine  CBL_READ_FILE

...Search for a string which value = constant BEGIN-XMLstream

......  (    value "<COORDONNEESANSI>".   )

...And when found save its OFFSET ( from the beginning of the file ) in a variable

...Search for a string which value = constant END-XMLstream

...... (    value "</COORDONNEESANSI>".   )

...And when found save its OFFSET ( from the beginning of the file ) in a variable

END-LOOP

34)

byte stream routine CBL_READ_FILE using as file-offset the OFSSET of the beginning of the 'interesting' XML-stream

You have isolated now from the initial HUGE XML stream the data you were interested with

(<COORDONNEESANSI><NOM>Dupont</NOM><PRENOM>Charles</PRENOM><ADDRESSE><ADR1>35 Rues de Alouettes</ADR1><ADR2>Quartier des affaires</ADR2></ADDRESSE><CODEPOSTAL>75000</CODEPOSTAL><VILLE>Paname</VILLE><PAYS>France</PAYS></COORDONNEESANSI>)

35)

You now just have to open an XML file using in the SELECT statement the pointer initially returned by  CBL_ALLOC_MEM. Read this file

(select XMLfilePart  ASSIGN mem-pointer length is ptr-len

...)

       ReadPartOfXMLstream.

           move file-offset-wrk to ptr-len

           initialize pXML-coordonneesansi

           open input XMLfilePart

           if xml-status not = 0

               display "Pb Open  XMLfilepart " xml-status  stop run

           end-if

           read XMLfilepart

           if xml-status < 0

               display "Pb Read  XMLfilepart " xml-status  stop run

           end-if

           display pXML-coordonneesansi

           close XMLfile

           if xml-status < 0

               display "Pb Close XMLfilepart " xml-status  stop run

           end-if

           .

36)

When finished, don't forget to close the file you opened with the byte stream routine ( CBL_OPEN_FILE ) == use CBL_CLOSE_FILE and to free the memory you allocated with CBL_ALLOC_MEM == use CBL_FREE_MEM

Attachments:

KB23989.zip

Old KB# 2108