Request headers
HtmlUnit mimics the browser as close as possible, of course this includes the sent request headers also.
But you can change this if needed at three levels; the request level, the client level and the BrowserVersion level.
BrowserVersion level
To change the request header at the BrowserVersion level you have to create your own customized browser version using the BrowserVersionBuilder.
final BrowserVersion browser = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX) .setAcceptLanguageHeader("de-CH") .build(); final WebClient webClient = new WebClient(browser); ....
There are many methods available to customize basic browser behavior like
- setApplicationCodeName(String)
- setApplicationMinorVersion(String)
- setApplicationName(String)
- setApplicationVersion(String)
- setBuildId(String)
- setPlatform(String)
- setSystemLanguage(String)
- setSystemTimezone(TimeZone)
- setUserAgent(String)
- setVendor(String)
- setUserLanguage(String)
- setBrowserLanguage(String)
- setAcceptEncodingHeader(String)
- setAcceptLanguageHeader(String)
- setCssAcceptHeader(String)
- setHtmlAcceptHeader(String)
- setImgAcceptHeader(String)
- setScriptAcceptHeader(String)
- setXmlHttpRequestAcceptHeader(String)
WebClient level
To change the request header at the client level use WebClient.addRequestHeader(). You are able to
add additional headers to every request made by this client or overwrite the default ones.
Example: add an addition header to every client request
client.addRequestHeader("from htmlunit", "yes");
Example: replace the default accept-language header for all request made by this client.
client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, "fr");
Example: replace the default accept-language header for all request made by this client.
client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, fromClient);
Example: replace the default accept-language header for all request made by this client.
client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, fromClient);
Request level
It is also possible to add/overwrite a request header for a dedicated request. Example:
WebRequest wr = new WebRequest(URL_FIRST); wr.setAdditionalHeader("from htmlunit", "yes"); .... client .getPage(wr);
Animations based on Window.requestAnimationFrame()
All browsers supported by HtmlUnit are able to do animations based on the Window.requestAnimationFrame() API. A typical example for this is Chart.js. This kind of animation support is not triggered automatically because HtmlUnit is headless. The javascript part of the API is implemented but the user of the HtmlUnit library has to force the triggering of the callback(s).
Example:
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { HtmlPage page = webClient.getPage(uri); webClient.waitForBackgroundJavaScript(1_000); // page is loaded and async js done // we are now processing the animation Window window = page.getEnclosingWindow().getScriptableObject(); int i = 0; // to be able to limit the animation cycles do { i++; // force one animation cycle // this invokes all the animation callbacks registered for this // window (by calling requestAnimationFrame(Object)) once. int pendingFrames = window.animateAnimationsFrames(); } while (pendingFrames > 0 && i < 200); }
Based on this you have to full control over the animation, you can skip all, but you are also check the current page state after each single animation step.
File download and Attachments
Normally pages are loaded inline: clicking on a link, for example, loads the linked page in the current window. Attached pages are different in that they are intended to be loaded outside of this flow: clicking on a link prompts the user to either save the linked page, or open it outside the current window, but does not load the page in the current window.
HtmlUnit complies with the semantics described above when an AttachmentHandler
has
been registered with the org.htmlunit.WebClient via
org.htmlunit.WebClient#setAttachmentHandler(AttachmentHandler). When
no attachment handler has been registered with the WebClient
, the semantics described
above to not apply, and attachments are loaded inline. By default, AttachmentHandler
s
are not registered with new WebClient
instances.
Clipboard
The clipboard interaction is disabled by default for the WebClient. This avoids side effect during testing and removes the need of having a running graphical subsystem (windows/X/xvfb).
To enable the clipboard support set a clipboard handler for the WebClient. HtmlUnit provides the AwtClipboardHandler the implements the interaction with your system/desktop clipboard. The AwtClipboardHandler works only if you are running on top of an graphical subsystem (windows/X/xvfb).
final ClipboardHandler clipboardHandler = new AwtClipboardHandler(); webClient().setClipboardHandler(clipboardHandler);
Then you can control the clipboard content from your program like this.
clipboardHandler.setClipboardContent("HtmlUnit");
Of course you can also implement your own ClipboardHandler to get full control and avoid interaction with the underlying operation system. Writing your own ClipboardHandler is also required if you are working in headless mode.
Content blocking
Out of the box HtmlUnit does not include any content blocking mechanism. But there are several
options to include your own.
Blocking based on the request (URL)
This simple form of content blocking works based on the requested url. Usually you have to use a list of blocked urls or some url patterns to detect the blocked url. In case the url is blocked the request is not sent to the server; instead a simple page is returned.
With HtmlUnit you can implement this using a WebConnectionWrapper.
try (WebClient webClient = new WebClient()) { webClient.getOptions().setThrowExceptionOnScriptError(false); // set more options // create a WebConnectionWrapper with an (subclassed) getResponse() impl new WebConnectionWrapper(webClient) { @Override public WebResponse getResponse(final WebRequest request) throws IOException { final URL url = request.getUrl(); // check the request url // if is allowed simple call super. if (!isBlocked(url)) { return super.getResponse(request); } // construct alternative response final String content = "<html><html>"; final WebResponseData data = new WebResponseData(content.getBytes(Charsets.UTF_8), 200, "blocked", Collections.emptyList()); final WebResponse blocked = new WebResponse(data, request, 0L); // if you like to check later on for blocked responses blocked.markAsBlocked("Blocked URL: '" + url.toExternalForm() + "'"); return blocked; } private boolean isBlocked(final URL url) { return true; } }; // use the client as usual final HtmlPage page = webClient.getPage(url); }
Blocking based on the response (headers)
requires HtmlUnit 3.4.0 or later
For blocking based on the response a more sophisticated approach is needed. The following sample code shows blocking base on the content length header. Using this you are able to check the header of the response and stop downloading the whole response directly. This might be helpful to improve the speed of some test cases.
For the implementation we have to deal with the real web connection to be able to access the headers before the whole content is downloaded and also to abort the download of the body itself. Therefore we have to replace the WebConnection with our own subclass of HttpWebConnection.
try (WebClient webClient = new WebClient()) { webClient.getOptions().setThrowExceptionOnScriptError(false); // set more options // use our own sublcass of HttpWebConnection webClient.setWebConnection(new HttpWebConnection(webClient) { @Override protected WebResponse downloadResponse(final HttpUriRequest httpMethod, final WebRequest webRequest, final HttpResponse httpResponse, final long startTime) throws IOException { // check content length header final int contentLenght = Integer.parseInt( httpResponse.getFirstHeader(HttpHeader.CONTENT_LENGTH).getValue()); // if not too big - done if (contentLenght < 1_000) { return super.downloadResponse(httpMethod, webRequest, httpResponse, startTime); } // abort downloading of the content httpMethod.abort(); // construct alternative response final String content = "<html><html>"; final WebResponseData data = new WebResponseData(content.getBytes(Charsets.UTF_8), 200, "blocked", Collections.emptyList()); final WebResponse blocked = new WebResponse(data, webRequest, 0L); // if you like to check later on for blocked responses blocked.markAsBlocked("Blocked URL: '" + url.toExternalForm() + "' content length: " + contentLenght); return blocked; } }); // use the client as usual final HtmlPage page = webClient.getPage(url); }
Blocking loading of frame content
By setting the FrameContentHandler of the WebClient, you can implement your own rules to decide whether the content of a frame should be loaded or not.
try (WebClient webClient = new WebClient()) { // use our own FrameContentHandler webClient.setFrameContentHandler(new FrameContentHandler() { @Override public boolean loadFrameDocument(final BaseFrameElement baseFrameElement) { final String src = baseFrameElement.getSrcAttribute(); // don't load the content from google return !src.contains("google"); } }); // use the client as usual final HtmlPage page = webClient.getPage(url); }
Multithreading/Threads Pooling
HtmlUnit uses an Executor backed by a CachedThreadPool for thread handling. This should work fine for common cases. The CachedThreadPool is in use since 2.54.0 to be able to support scenarios using many treads e.g. because of many WebSockets.
Starting with 2.45.0 you can change this by using WebClient.setExecutor(ExecutorService). It might be a good idea to also implement some thread naming to distinguish Threads used by HtmlUnit from the rest.
Local/Session Storage
HtmlUnit supports the Web Storage API.
But for testing purposes it might be useful to add some entries to the storage before running a test and also accessing the stored content. Therefor the storage is accessible from the web client using the StorageHolder implementation.
SessionStorage example
try (WebClient webClient = new WebClient()) { // get the session storage for the current window final Map<String, String> sessionStorage = webClient.getStorageHolder().getSessionStorage(webClient.getCurrentWindow()); // place some data in the session storage sessionStorage.put("myKey", "myData"); // load the page that consumes the session storage data webClient.getPage(url); // make sure the new data are in assertEquals("myNewData", sessionStorage.get("myNewKey")); }
LocalStorage example
try (WebClient webClient = new WebClient()) { // get the local storage for the url // the url has to match the page url you will load later final Map<String, String> localStorage = webClient.getStorageHolder().getLocalStorage(url); // place some data in the session storage localStorage.put("myKey", "myData"); // load the page that consumes the session storage data webClient.getPage(url); // make sure the new data are in assertEquals("myNewData", localStorage.get("myNewKey")); }
Client side certificates
HtmlUnit optionally provides support for using client side certificates. You can use different way to provide the certificates
- WebClientOptions.setSSLClientCertificateKeyStore(InputStream keyStoreInputStream, String keyStorePassword, String keyStoreType)
- WebClientOptions.setSSLClientCertificateKeyStore(URL keyStoreUrl, String keyStorePassword, String keyStoreType)
- WebClientOptions.setSSLClientCertificateKeyStore(KeyStore keyStore, char[] keyStorePassword)