Skip to main content

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca
HTML is really just a special case of XML, so if your HTML files have a somewhat predictable structure, you could use AcuXML for this. Use the "xml2fd" utility with a representative HTML file to generate SELECT and FD copybooks, add them to your COBOL program, and voila, you can OPEN INPUT, READ, then CLOSE the file like any sequential file!

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca
Hi cedgin,
did you elaborate successfully HTML files in this way?
I tried to run xml2fd with one of my HTML files, but I had not
been able to get FD and SL files. The XML2FD 6.0 states "Invalid XML file", while the same command on a XML file gives me an FD and
SL.

Anyway rigth now I created an utility HTML2TXT that extracts all
the data from an HTML, and filter all the tags, in this way it's easier find out the infos I need.

regards,
Luca

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca
Hi cedgin,
did you elaborate successfully HTML files in this way?
I tried to run xml2fd with one of my HTML files, but I had not
been able to get FD and SL files. The XML2FD 6.0 states "Invalid XML file", while the same command on a XML file gives me an FD and
SL.

Anyway rigth now I created an utility HTML2TXT that extracts all
the data from an HTML, and filter all the tags, in this way it's easier find out the infos I need.

regards,
Luca

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca
Hi cedgin,
did you elaborate successfully HTML files in this way?
I tried to run xml2fd with one of my HTML files, but I had not
been able to get FD and SL files. The XML2FD 6.0 states "Invalid XML file", while the same command on a XML file gives me an FD and
SL.

Anyway rigth now I created an utility HTML2TXT that extracts all
the data from an HTML, and filter all the tags, in this way it's easier find out the infos I need.

regards,
Luca

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca
My apologies for not verifying that this would work before I posted!

The truth is that HTML is NOT an exact subset of XML, but stems from a common ancestry. The syntax rules for HTML are more permissive than XML regarding empty elements, nesting of elements, use of closing tags on all elements, and so forth.

I found a nifty utility called HTMLTidy (http://tidy.sourceforge.net/) that claims to be able to convert an HTML file to a well-formed XML file. I tried this and then used xml2fd on the resulting file, and got my FD/Select copybooks. The bad news is that the XML was still not in the form that xml2fd expects, so I only got a level 01 definition for the portion of the file, but nothing for the . In other words, it looks like xml2fd will not generate multiple record definitions (01 levels). It just uses the first one.