Content

Request headers

HtmlUnit mimics the browser as close as possible, of course this includes the sent request headers also. But you can change this if needed at three levels; the request level, the client level and the BrowserVersion level.

BrowserVersion level

To change the request header at the BrowserVersion level you have to create your own customized browser version using the BrowserVersionBuilder.

final BrowserVersion browser =
    new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX)
          .setAcceptLanguageHeader("de-CH")
          .build();

final WebClient webClient = new WebClient(browser);
....

There are many methods available to customize basic browser behavior like

  • setApplicationCodeName(String)
  • setApplicationMinorVersion(String)
  • setApplicationName(String)
  • setApplicationVersion(String)
  • setBuildId(String)
  • setPlatform(String)
  • setSystemLanguage(String)
  • setSystemTimezone(TimeZone)
  • setUserAgent(String)
  • setVendor(String)
  • setUserLanguage(String)
  • setBrowserLanguage(String)
  • setAcceptEncodingHeader(String)
  • setAcceptLanguageHeader(String)
  • setCssAcceptHeader(String)
  • setHtmlAcceptHeader(String)
  • setImgAcceptHeader(String)
  • setScriptAcceptHeader(String)
  • setXmlHttpRequestAcceptHeader(String)

WebClient level

To change the request header at the client level use WebClient.addRequestHeader(). You are able to add additional headers to every request made by this client or overwrite the default ones.
Example: add an addition header to every client request

client.addRequestHeader("from htmlunit", "yes");

Example: replace the default accept-language header for all request made by this client.

client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, "fr");

Example: replace the default accept-language header for all request made by this client.

client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, fromClient);

Example: replace the default accept-language header for all request made by this client.

client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, fromClient);

Request level

It is also possible to add/overwrite a request header for a dedicated request. Example:

WebRequest wr = new WebRequest(URL_FIRST);
wr.setAdditionalHeader("from htmlunit", "yes");
....
client .getPage(wr);

Animations based on Window.requestAnimationFrame()

All browsers supported by HtmlUnit are able to do animations based on the Window.requestAnimationFrame() API. A typical example for this is Chart.js. This kind of animation support is not triggered automatically because HtmlUnit is headless. The javascript part of the API is implemented but the user of the HtmlUnit library has to force the triggering of the callback(s).

Example:

try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
    HtmlPage page = webClient.getPage(uri);
    webClient.waitForBackgroundJavaScript(1_000);

    // page is loaded and async js done

    // we are now processing the animation
    Window window = page.getEnclosingWindow().getScriptableObject();
    int i = 0; // to be able to limit the animation cycles
    do {
        i++;

        // force one animation cycle
        // this invokes all the animation callbacks registered for this
        // window (by calling requestAnimationFrame(Object)) once.
        int pendingFrames = window.animateAnimationsFrames();
    } while (pendingFrames > 0 && i < 200);
}

Based on this you have to full control over the animation, you can skip all, but you are also check the current page state after each single animation step.

File download and Attachments

Normally pages are loaded inline: clicking on a link, for example, loads the linked page in the current window. Attached pages are different in that they are intended to be loaded outside of this flow: clicking on a link prompts the user to either save the linked page, or open it outside the current window, but does not load the page in the current window.

HtmlUnit complies with the semantics described above when an AttachmentHandler has been registered with the org.htmlunit.WebClient via org.htmlunit.WebClient#setAttachmentHandler(AttachmentHandler). When no attachment handler has been registered with the WebClient, the semantics described above to not apply, and attachments are loaded inline. By default, AttachmentHandlers are not registered with new WebClient instances.

Please find more details and samples in the File download section.

Clipboard

The clipboard interaction is disabled by default for the WebClient. This avoids side effect during testing and removes the need of having a running graphical subsystem (windows/X/xvfb).

To enable the clipboard support set a clipboard handler for the WebClient. HtmlUnit provides the AwtClipboardHandler the implements the interaction with your system/desktop clipboard. The AwtClipboardHandler works only if you are running on top of an graphical subsystem (windows/X/xvfb).

  final ClipboardHandler clipboardHandler = new AwtClipboardHandler();
  webClient().setClipboardHandler(clipboardHandler);

Then you can control the clipboard content from your program like this.

  clipboardHandler.setClipboardContent("HtmlUnit");

Of course you can also implement your own ClipboardHandler to get full control and avoid interaction with the underlying operation system. Writing your own ClipboardHandler is also required if you are working in headless mode.

Content blocking

Out of the box HtmlUnit does not include any content blocking mechanism. But there are several options to include your own.

Blocking based on the request (URL)

This simple form of content blocking works based on the requested url. Usually you have to use a list of blocked urls or some url patterns to detect the blocked url. In case the url is blocked the request is not sent to the server; instead a simple page is returned.

With HtmlUnit you can implement this using a WebConnectionWrapper.

    try (WebClient webClient = new WebClient()) {
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        // set more options

        // create a WebConnectionWrapper with an (subclassed) getResponse() impl
        new WebConnectionWrapper(webClient) {

            @Override
            public WebResponse getResponse(final WebRequest request) throws IOException {
                final URL url = request.getUrl();
                // check the request url
                // if is allowed simple call super.

                if (!isBlocked(url)) {
                    return super.getResponse(request);
                }

                // construct alternative response
                final String content = "<html><html>";
                final WebResponseData data = new WebResponseData(content.getBytes(Charsets.UTF_8),
                        200, "blocked", Collections.emptyList());
                final WebResponse blocked = new WebResponse(data, request, 0L);
                // if you like to check later on for blocked responses
                blocked.markAsBlocked("Blocked URL: '" + url.toExternalForm() + "'");
                return blocked;
            }

            private boolean isBlocked(final URL url) {
                return true;
            }
        };

        // use the client as usual
        final HtmlPage page = webClient.getPage(url);
    }
            

Blocking based on the response (headers)

requires HtmlUnit 3.4.0 or later

For blocking based on the response a more sophisticated approach is needed. The following sample code shows blocking base on the content length header. Using this you are able to check the header of the response and stop downloading the whole response directly. This might be helpful to improve the speed of some test cases.

For the implementation we have to deal with the real web connection to be able to access the headers before the whole content is downloaded and also to abort the download of the body itself. Therefore we have to replace the WebConnection with our own subclass of HttpWebConnection.

    try (WebClient webClient = new WebClient()) {
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        // set more options
  
        // use our own sublcass of HttpWebConnection
        webClient.setWebConnection(new HttpWebConnection(webClient) {
            @Override
            protected WebResponse downloadResponse(final HttpUriRequest httpMethod,
                    final WebRequest webRequest, final HttpResponse httpResponse,
                    final long startTime) throws IOException {
  
                // check content length header
                final int contentLenght = Integer.parseInt(
                        httpResponse.getFirstHeader(HttpHeader.CONTENT_LENGTH).getValue());
  
                // if not too big - done
                if (contentLenght < 1_000) {
                    return super.downloadResponse(httpMethod, webRequest, httpResponse, startTime);
                }
  
                // abort downloading of the content
                httpMethod.abort();
  
                // construct alternative response
                final String content = "<html><html>";
                final WebResponseData data = new WebResponseData(content.getBytes(Charsets.UTF_8),
                        200, "blocked", Collections.emptyList());
                final WebResponse blocked = new WebResponse(data, webRequest, 0L);
                // if you like to check later on for blocked responses
                blocked.markAsBlocked("Blocked URL: '" + url.toExternalForm()
                            + "' content length: " + contentLenght);
                return blocked;
            }
        });
  
        // use the client as usual
        final HtmlPage page = webClient.getPage(url);
    }
            

Blocking loading of frame content

By setting the FrameContentHandler of the WebClient, you can implement your own rules to decide whether the content of a frame should be loaded or not.

    try (WebClient webClient = new WebClient()) {
        // use our own FrameContentHandler
        webClient.setFrameContentHandler(new FrameContentHandler() {

            @Override
            public boolean loadFrameDocument(final BaseFrameElement baseFrameElement) {
                final String src = baseFrameElement.getSrcAttribute();
                // don't load the content from google
                return !src.contains("google");
            }

        });

        // use the client as usual
        final HtmlPage page = webClient.getPage(url);
    }
            

Multithreading/Threads Pooling

HtmlUnit uses an Executor backed by a CachedThreadPool for thread handling. This should work fine for common cases. The CachedThreadPool is in use since 2.54.0 to be able to support scenarios using many treads e.g. because of many WebSockets.

Starting with 2.45.0 you can change this by using WebClient.setExecutor(ExecutorService). It might be a good idea to also implement some thread naming to distinguish Threads used by HtmlUnit from the rest.

Local/Session Storage

HtmlUnit supports the Web Storage API.

But for testing purposes it might be useful to add some entries to the storage before running a test and also accessing the stored content. Therefor the storage is accessible from the web client using the StorageHolder implementation.

SessionStorage example

    try (WebClient webClient = new WebClient()) {

        // get the session storage for the current window
        final Map<String, String> sessionStorage =
                webClient.getStorageHolder().getSessionStorage(webClient.getCurrentWindow());

        // place some data in the session storage
        sessionStorage.put("myKey", "myData");

        // load the page that consumes the session storage data
        webClient.getPage(url);

        // make sure the new data are in
        assertEquals("myNewData", sessionStorage.get("myNewKey"));
    }
                

LocalStorage example

    try (WebClient webClient = new WebClient()) {

        // get the local storage for the url
        // the url has to match the page url you will load later
        final Map<String, String> localStorage = webClient.getStorageHolder().getLocalStorage(url);

        // place some data in the session storage
        localStorage.put("myKey", "myData");

        // load the page that consumes the session storage data
        webClient.getPage(url);

        // make sure the new data are in
        assertEquals("myNewData", localStorage.get("myNewKey"));
    }
                

Client side certificates

HtmlUnit optionally provides support for using client side certificates. You can use different way to provide the certificates

Please have a look at the javadoc for more details.