[archive] HTML handling

Dominique Sacre
Author
Rocketeer
Forum|Forum|21 years ago
February 25, 2004

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca

HTML is really just a special case of XML, so if your HTML files have a somewhat predictable structure, you could use AcuXML for this. Use the "xml2fd" utility with a representative HTML file to generate SELECT and FD copybooks, add them to your COBOL program, and voila, you can OPEN INPUT, READ, then CLOSE the file like any sequential file!

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|21 years ago
February 25, 2004

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca

Hi cedgin,
did you elaborate successfully HTML files in this way?
I tried to run xml2fd with one of my HTML files, but I had not
been able to get FD and SL files. The XML2FD 6.0 states "Invalid XML file", while the same command on a XML file gives me an FD and
SL.

Anyway rigth now I created an utility HTML2TXT that extracts all
the data from an HTML, and filter all the tags, in this way it's easier find out the infos I need.

regards,
Luca

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|21 years ago
February 25, 2004

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca

Hi cedgin,
did you elaborate successfully HTML files in this way?
I tried to run xml2fd with one of my HTML files, but I had not
been able to get FD and SL files. The XML2FD 6.0 states "Invalid XML file", while the same command on a XML file gives me an FD and
SL.

Anyway rigth now I created an utility HTML2TXT that extracts all
the data from an HTML, and filter all the tags, in this way it's easier find out the infos I need.

regards,
Luca

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|21 years ago
February 25, 2004

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca

Hi cedgin,
did you elaborate successfully HTML files in this way?
I tried to run xml2fd with one of my HTML files, but I had not
been able to get FD and SL files. The XML2FD 6.0 states "Invalid XML file", while the same command on a XML file gives me an FD and
SL.

Anyway rigth now I created an utility HTML2TXT that extracts all
the data from an HTML, and filter all the tags, in this way it's easier find out the infos I need.

regards,
Luca

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|21 years ago
February 26, 2004

[Migrated content. Thread originally posted on 23 February 2004]

Hi all,
I need to elaborate some HTML files and to
grab some info from this files.
The files are the response from a query, so more
or less the file mantain the same structure.
The info I need to extratc are the values of some (not all) tag, not attributes.
I just wish to know if there is a good way/tool that
allow me to work in a standard/safe way.
Otherwise I have to "parse" the HTML in my
Cobol program, but this involve a lot of work
(that I wish to avoid if possible).
Interfacing with other languages/components
is not a problem.

Many thanks for your attention
Luca

My apologies for not verifying that this would work before I posted!

The truth is that HTML is NOT an exact subset of XML, but stems from a common ancestry. The syntax rules for HTML are more permissive than XML regarding empty elements, nesting of elements, use of closing tags on all elements, and so forth.

I found a nifty utility called HTMLTidy (http://tidy.sourceforge.net/) that claims to be able to convert an HTML file to a well-formed XML file. I tried this and then used xml2fd on the resulting file, and got my FD/Select copybooks. The bad news is that the XML was still not in the form that xml2fd expects, so I only got a level 01 definition for the portion of the file, but nothing for the . In other words, it looks like xml2fd will not generate multiple record definitions (01 levels). It just uses the first one.