Is it possible to obtain the source of a webpage, which is currently in IE or from Chrome command line or code using Java ? I believe there should be a way. If yes, then support Chrome and IE as many tabs how can we get accurate information?
I am trying to process the content from hundreds of webpages, some of them are regular 15 seconds Automatically refresh the interval I and nothing else.
Yes, I could get a webpage source using sockets or by using an instance of the URLConnection class. However, it does not provide default refresh functionality of the browser. The URL will be the only option to hit multiple times, which can be used with default browser refresh functionality, which can be avoided.
In addition, it would be great if the reader can fill in the text box using a program and comment on submitting a request from the browser. Thank you.
There are several "scraping" frameworks in Java.
I personally have Jessop because it is in light and compact code.
// Get the source of a website in just 1 line code Document Doctor = Jsoup.connect ("http://www.google.com") .get (); // Print all hyperlink path element link = doc.select ("a [href = $. Html]"); For (Element LNK: links) System.out.println (lnk.attr ("href"));
Although it does not present JavaScript or anything like that. It's simple, fast, stupid.
I think you might like to use HtmlUnit instead, which is more like an invisible web browser. It lets you click on button events, execute javascript Also gives the possibility ... etc. You can copy it to Internet Explorer or Firefox.
Comments
Post a Comment