Introduction
The WebClient represent the browser if you work with HtmlUnit. To start using HtmlUnit you have to in instantiate a new WebClient - like starting the browser in the real world.
WebClient implements AutoCloseable; you should always use it with try-with-resources constructions. After a WebClient is closed (see WebClient.close()), any further use is not supported and might lead to exceptions or incorrect behaviour.
try (final WebClient webClient = new WebClient()) { // now you have a running browser, and you can start doing real things // like going to a web page final HtmlPage page = webClient.getPage("https://www.htmlunit.org/"); }
Imitating a specific browser
Often you will want to simulate a specific browser. This is done by passing a org.htmlunit.BrowserVersion into the WebClient constructor. Constants have been provided for some common browsers.
@Test public void homePage_Firefox() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { final HtmlPage page = webClient.getPage("https://www.htmlunit.org/"); Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText()); } }
Specifying this BrowserVersion will change
- the user agent HTTP header,
- the values and the order of many other HTTP headers,
- the list of supported mime types,
- the behavior of the web client,
- the supported javascript methods and and also the behaviour of some js functions,
- the behavior of the web client, and
- the default values for various css properties
In most cases, it should be sufficient to use the predefined BrowserVersion constants.
Using the options to adjust the browser
There are various options available to make fine grained adjustments to the browser.
@Test public void homePage_Firefox() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { // disable javascript webClient.getOptions().setJavaScriptEnabled(false); // disable css support webClient.getOptions().setCssEnabled(false); final HtmlPage page = webClient.getPage("https://www.htmlunit.org/"); Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText()); } }
The default values for most options are similar to the default values
of real browsers - but (as always) there is one important exception:
HtmlUnit stops the Javascript execution at the first unhandled exception -
Browsers do not stop. You can change this by changing the throwExceptionOnScriptError
option to false.
@Test public void homePage_Firefox() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { // proceed with the js execution on unhandled js errors webClient.getOptions().setThrowExceptionOnScriptError(false); final HtmlPage page = webClient.getPage("https://www.htmlunit.org/"); Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText()); } }
For a complete list and more details please have a look at the WebClientOptions API.
Change the browser language / time zone
Changing the language/time zone cannot be done from the options, it must be done before the WebClient is created.
All Browser Versions are shipped with 'en-US' as language and 'America/New_York' as timezone.
To change these default settings, a customised copy of the corresponding BrowserVersion must be created using the
BrowserVersionBuilder. This new BrowserVersion can then be used to create a WebClient.
final BrowserVersion.BrowserVersionBuilder builder = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX); builder.setSystemTimezone(TimeZone.getTimeZone("Europe/Berlin")); builder.setBrowserLanguage("de-DE"); builder.setAcceptLanguageHeader("de-DE,de"); final BrowserVersion germanFirefox = builder.build(); try (final WebClient webClient = new WebClient(germanFirefox)) { ....
There is no support for changing the language/timezone after the WebClient has been created.
For more details please have a look at the
BrowserVersion.BrowserVersionBuilder API.
Change the browser user agent
Changing the user agent is similar to changing language/time zone (see above).
You have to create a customised copy of the corresponding BrowserVersion using the
BrowserVersionBuilder. This adapted BrowserVersion can then be used to create a WebClient.
final BrowserVersion.BrowserVersionBuilder builder = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX); builder.setUserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_5 like Mac OS X) " + "AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/128.0 Mobile/15E148 Safari/605.1.15"); final BrowserVersion iosFirefox = builder.build(); try (WebClient webClient = new WebClient(iosFirefox)) { ....
For more details please have a look at the BrowserVersion.BrowserVersionBuilder API.
Using HtmlUnit behind a proxy
Using a http proxy
There is a special WebClient constructor that allows you to specify proxy server information in those cases where you need to connect through one.
@Test public void homePage_proxy() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX, PROXY_HOST, PROXY_PORT)) { //set proxy username and password final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider(); credentialsProvider.addCredentials("username", "password"); final HtmlPage page = webClient.getPage("https://www.htmlunit.org"); Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); } }
In case the proxy server requires credentials you can define them on the DefaultCredentialsProvider from the webClient
@Test public void homePage_proxy() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX, PROXY_HOST, PROXY_PORT)) { //set proxy username and password final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider(); credentialsProvider.addCredentials("username", "password", PROXY_HOST, PROXY_PORT); final HtmlPage page = webClient.getPage("https://www.htmlunit.org"); Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); } }
Socks proxy sample
The setup of socks proxies is a bit more tricky but in general follows the same pattern.
@Test public void homePage_proxy() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX) { // socks proxy / the true as last parameter marks this as socks proxy webClient.getOptions().setProxyConfig(new ProxyConfig(SOCKS_PROXY_HOST, SOCKS_PROXY_PORT, null, true)); //set proxy username and password if required final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider(); credentialsProvider.addSocksCredentials("username", "password", SOCKS_PROXY_HOST, SOCKS_PROXY_PORT); final HtmlPage page = webClient.getPage("https://www.htmlunit.org"); Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); } }
WebWindowListener / WebWindowEvents
If you wish to be notified when windows are created or pages are loaded, you need to register a WebWindowListener with the WebClient using WebClient.addWebWindowListener(WebWindowListener)
When a window is opened either by JavaScript or through the WebClient, a WebWindowEvent will be fired and passed into the WebWindowListener.webWindowOpened(WebWindowEvent) method. Note that both the new and old pages in the event will be null as the window does not have any content loaded at this point. If a URL was specified during creation of the window then the page will be loaded and another event will be fired as described below.
When a new page is loaded into a specific window, a WebWindowEvent will be fired and passed into the WebWindowListener.webWindowContentChanged(WebWindowEvent) method.
Using handlers
There are many handlers used by the WebClient for special purposes. These Handlers are implementing specific interfaces, and you are able to replace them with your own implementations. Default implementations are also available.
AlertHandler
The handler to be used to process JavaScript alerts triggered when the JavaScript method Window.alert() is called.
ConfirmHandler
The handler for the JavaScript function window.confirm()
.
PromptHandler
The handler for the JavaScript function window.prompt()
.
StatusHandler
A handler for changes to window.status
.
AttachmentHandler
A handler for attachments, which represent pages received from the server which contain Content-Disposition=attachment headers.
ClipboardHandler
A handler for clipboard access.
PrintHandler
A handler for providing Window.print() implementations.
WebStartHandler
A handler for webstart support.
FrameContentHandler
A handler to make a decision to load the frame content or not.
CSSErrorHandler
For CSS parser error processing.
OnbeforeunloadHandler
RefreshHandler
A handler for page refreshes.
Polyfills
The number of javascript API's supported by the browsers seems to increase every day. Because of the limited development resources
of the HtmlUnit project, being on track with this is really hard.
But there are already many polyfills available (to add API support for older borwsers). The idea is to use some of these polyfills
to add the missing API's.
Starting with version 2.59.0 HtmlUnit supports the integration of polyfills; there is a dedicated option for every supported
polyfill (disabled per default) and if enabled, the polyfill is automatically loaded.
@Test public void fetchSupport() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { // enable fetch api polyfill webClient.getOptions().setFetchPolyfillEnabled(true); final HtmlPage page = webClient.getPage(....); } }
Fetch API Polyfill
webClient.getOptions().setFetchPolyfillEnabled(true);