Skip to main content

[archive] Get HTML source from webbrowser component

  • November 22, 2006
  • 4 replies
  • 0 views

[Migrated content. Thread originally posted on 21 November 2006]

I need to get the html source of a webbrowser component, so I can parse it and use the parsed data in my program, but he standard AcuGT web doesn't allow me to view the html source of the displayed page.

For C# I found the following code:
object empty = System.Reflection.Missing.Value;
axWebBrowser1.Navigate("about:blank", ref empty, ref empty, ref empty, ref empty);

//Now we can get a reference to the document's DOM via the IHTMLDocument interface and write to it.

// create an IHTMLDocument2
mshtml.IHTMLDocument2 doc = axWebBrowser1.Document as mshtml.IHTMLDocument2;

// write to the doc
doc.clear();
doc.writeln("This is my text...");

doc.close();


Does anyone know I can use this solution in AcuGT?

4 replies

[Migrated content. Thread originally posted on 21 November 2006]

I need to get the html source of a webbrowser component, so I can parse it and use the parsed data in my program, but he standard AcuGT web doesn't allow me to view the html source of the displayed page.

For C# I found the following code:
object empty = System.Reflection.Missing.Value;
axWebBrowser1.Navigate("about:blank", ref empty, ref empty, ref empty, ref empty);

//Now we can get a reference to the document's DOM via the IHTMLDocument interface and write to it.

// create an IHTMLDocument2
mshtml.IHTMLDocument2 doc = axWebBrowser1.Document as mshtml.IHTMLDocument2;

// write to the doc
doc.clear();
doc.writeln("This is my text...");

doc.close();


Does anyone know I can use this solution in AcuGT?
If you're running on a Linux or UNIX platform you could use "wget" to download the webpage to your server and then open it as a line sequential text file.

Ian

[Migrated content. Thread originally posted on 21 November 2006]

I need to get the html source of a webbrowser component, so I can parse it and use the parsed data in my program, but he standard AcuGT web doesn't allow me to view the html source of the displayed page.

For C# I found the following code:
object empty = System.Reflection.Missing.Value;
axWebBrowser1.Navigate("about:blank", ref empty, ref empty, ref empty, ref empty);

//Now we can get a reference to the document's DOM via the IHTMLDocument interface and write to it.

// create an IHTMLDocument2
mshtml.IHTMLDocument2 doc = axWebBrowser1.Document as mshtml.IHTMLDocument2;

// write to the doc
doc.clear();
doc.writeln("This is my text...");

doc.close();


Does anyone know I can use this solution in AcuGT?
Well, the program is running on a Linux server, but the site requires me tyo log in with a login form. After that, a session is created wich only works with the browser I used to login. If I paste the URL in another browser, I get redirected to the login form.

[Migrated content. Thread originally posted on 21 November 2006]

I need to get the html source of a webbrowser component, so I can parse it and use the parsed data in my program, but he standard AcuGT web doesn't allow me to view the html source of the displayed page.

For C# I found the following code:
object empty = System.Reflection.Missing.Value;
axWebBrowser1.Navigate("about:blank", ref empty, ref empty, ref empty, ref empty);

//Now we can get a reference to the document's DOM via the IHTMLDocument interface and write to it.

// create an IHTMLDocument2
mshtml.IHTMLDocument2 doc = axWebBrowser1.Document as mshtml.IHTMLDocument2;

// write to the doc
doc.clear();
doc.writeln("This is my text...");

doc.close();


Does anyone know I can use this solution in AcuGT?
I keep on trying to find the solution myself and I have put a WebBrowser ActiveX on my screen. When I inquire the property 'document', I get a pointer to the automation object of the active document.

The MSDN Library tells me what properties this document object has, but I don't know how to call this in my cobol program.

I tried to use the axdefgen utility to generate a mshtml.def from the Library 'Microsoft HTML Object Library (c:\\windows\\system32\\mshtml.tlb), but when I include the file mshtml.def in my cobol program I get strange compilation errors.

[Migrated content. Thread originally posted on 21 November 2006]

I need to get the html source of a webbrowser component, so I can parse it and use the parsed data in my program, but he standard AcuGT web doesn't allow me to view the html source of the displayed page.

For C# I found the following code:
object empty = System.Reflection.Missing.Value;
axWebBrowser1.Navigate("about:blank", ref empty, ref empty, ref empty, ref empty);

//Now we can get a reference to the document's DOM via the IHTMLDocument interface and write to it.

// create an IHTMLDocument2
mshtml.IHTMLDocument2 doc = axWebBrowser1.Document as mshtml.IHTMLDocument2;

// write to the doc
doc.clear();
doc.writeln("This is my text...");

doc.close();


Does anyone know I can use this solution in AcuGT?
Rather than use the browser control, try using XMLHttp. I have not done this from Cobol so I do not have a specfic example, but here is how it can be done from VBScript:

Set xmlhttp= CreateObject("Microsoft.XMLHTTP")
xmlhttp.Open "GET", "http://www.msn.com", False
xmlhttp.Send
wtext = xmlhttp.ResponseText