Module org.htmlunit

Class UrlUtils

java.lang.Object
org.htmlunit.util.UrlUtils

public final class UrlUtils extends Object
URL utilities class that makes it easy to create new URLs based off of old URLs without having to assemble or parse them yourself.
Author:
Daniel Gredler, Martin Tamme, Sudhan Moghe, Marc Guillemot, Ahmed Ashour, Ronald Brill, Joerg Werner, Hartmut Arlt
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    "about".
    static final String
    "about:blank".
    static final String
    "about:".
    static final URL
    URL for "about:blank".
  • Method Summary

    Modifier and Type
    Method
    Description
    static String
    decode(String escaped)
    Unescapes and decodes the specified string.
    static byte[]
    decodeDataUrl(byte[] bytes, boolean removeWhitespace)
    Decodes an array of URL safe 7-bit characters into an array of original bytes.
    static byte[]
    decodeUrl(byte[] bytes)
    Decodes an array of URL safe 7-bit characters into an array of original bytes.
    static String
    Encodes and escapes the specified URI anchor string.
    static String
    Encodes and escapes the specified URI hash string.
    static String
    Encodes and escapes the specified URI hash string.
    static String
     
    static URL
    encodeUrl(URL url, Charset charset)
    Encodes illegal characters in the specified URL's path, query string and anchor according to the URL encoding rules observed in real browsers.
    static byte[]
    encodeUrl(BitSet urlsafe, byte[] bytes)
    Encodes an array of bytes into an array of URL safe 7-bit characters.
    static URL
    Creates and returns a new URL identical to the specified URL, except using the specified host.
    static URL
    getUrlWithNewHostAndPort(URL u, String newHost, int newPort)
    Creates and returns a new URL identical to the specified URL, except using the specified host.
    static URL
    Creates and returns a new URL identical to the specified URL, except using the specified path.
    static URL
    getUrlWithNewPort(URL u, int newPort)
    Creates and returns a new URL identical to the specified URL, except using the specified port.
    static URL
    getUrlWithNewProtocol(URL u, String newProtocol)
    Creates and returns a new URL identical to the specified URL, except using the specified protocol.
    static URL
    Creates and returns a new URL identical to the specified URL, except using the specified query string.
    static URL
    Creates and returns a new URL identical to the specified URL, except using the specified reference.
    static URL
    getUrlWithNewUserName(URL u, String newUserName)
    Creates and returns a new URL identical to the specified URL but with a changed user name.
    static URL
    getUrlWithNewUserPassword(URL u, String newUserPassword)
    Creates and returns a new URL identical to the specified URL but with a changed user password.
    static URL
    Creates and returns a new URL using only the protocol and authority from the given one.
    static URL
    Creates and returns a new URL using only the protocol, authority and path from the given one.
    static URL
    Creates and returns a new URL identical to the specified URL, ignoring path, protocol and query.
    static boolean
    isSameOrigin(URL originUrl, URL newUrl)
    Determines whether two URLs share the same origin according to the Same-Origin Policy.
    static boolean
    Returns true if specified string is a special scheme.
    static boolean
    Returns true if specified string is a valid scheme name.
    static String
    Helper that constructs a normalized url string usable as cache key.
    static URL
    Removes the well known ports if it can be deduced from protocol.
    static String
    resolveUrl(String baseUrl, String relativeUrl)
    Resolves a given relative URL against a base URL.
    static String
    resolveUrl(URL baseUrl, String relativeUrl)
    Resolves a given relative URL against a base URL.
    static boolean
    sameFile(URL u1, URL u2)
    More or less the same as sameFile(URL, URL) but without resolving the host to an IP address for comparing.
    static URI
    toURI(URL url, String query)
    Constructs a URI using the specified URL.
    static URL
    Constructs a URL instance based on the specified URL string, taking into account the fact that the specified URL string may represent an "about:..."
    static URL
    Constructs a URL instance based on the specified URL string, taking into account the fact that the specified URL string may represent an "about:..."

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Method Details

    • toUrlSafe

      public static URL toUrlSafe(String url)

      Constructs a URL instance based on the specified URL string, taking into account the fact that the specified URL string may represent an "about:..." URL, a "javascript:..." URL, or a data:... URL.

      The caller should be sure that URL strings passed to this method will parse correctly as URLs, as this method never expects to have to handle MalformedURLExceptions.

      Parameters:
      url - the URL string to convert into a URL instance
      Returns:
      the constructed URL instance
    • toUrlUnsafe

      public static URL toUrlUnsafe(String url) throws MalformedURLException

      Constructs a URL instance based on the specified URL string, taking into account the fact that the specified URL string may represent an "about:..." URL, a "javascript:..." URL, or a data:... URL.

      Unlike toUrlSafe(String), the caller need not be sure that URL strings passed to this method will parse correctly as URLs.

      Parameters:
      url - the URL string to convert into a URL instance
      Returns:
      the constructed URL instance
      Throws:
      MalformedURLException - if the URL string cannot be converted to a URL instance
    • encodeUrl

      public static URL encodeUrl(URL url, Charset charset)

      Encodes illegal characters in the specified URL's path, query string and anchor according to the URL encoding rules observed in real browsers.

      For example, this method changes "http://first/?a=b c" to "http://first/?a=b%20c".

      Parameters:
      url - the URL to encode
      charset - the charset
      Returns:
      the encoded URL
    • encodeAnchor

      public static String encodeAnchor(String anchor)
      Encodes and escapes the specified URI anchor string.
      Parameters:
      anchor - the anchor string to encode and escape
      Returns:
      the encoded and escaped anchor string
    • encodeHash

      public static String encodeHash(String hash)
      Encodes and escapes the specified URI hash string.
      Parameters:
      hash - the anchor string to encode and escape
      Returns:
      the encoded and escaped hash string
    • encodeQuery

      public static String encodeQuery(String query)
      Encodes and escapes the specified URI hash string.
      Parameters:
      query - the query string to encode and escape
      Returns:
      the encoded and escaped hash string
    • decode

      public static String decode(String escaped)
      Unescapes and decodes the specified string.
      Parameters:
      escaped - the string to be unescaped and decoded
      Returns:
      the unescaped and decoded string
    • getUrlWithoutPathRefQuery

      public static URL getUrlWithoutPathRefQuery(URL u) throws MalformedURLException
      Creates and returns a new URL using only the protocol and authority from the given one.
      Parameters:
      u - the URL on which to base the returned URL
      Returns:
      a new URL using only the protocol and authority from the given one
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithoutRef

      public static URL getUrlWithoutRef(URL u) throws MalformedURLException
      Creates and returns a new URL using only the protocol, authority and path from the given one.
      Parameters:
      u - the URL on which to base the returned URL
      Returns:
      a new URL using only the protocol and authority from the given one
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewProtocol

      public static URL getUrlWithNewProtocol(URL u, String newProtocol) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified protocol.
      Parameters:
      u - the URL on which to base the returned URL
      newProtocol - the new protocol to use in the returned URL
      Returns:
      a new URL identical to the specified URL, except using the specified protocol
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewHost

      public static URL getUrlWithNewHost(URL u, String newHost) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified host.
      Parameters:
      u - the URL on which to base the returned URL
      newHost - the new host to use in the returned URL
      Returns:
      a new URL identical to the specified URL, except using the specified host
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewHostAndPort

      public static URL getUrlWithNewHostAndPort(URL u, String newHost, int newPort) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified host.
      Parameters:
      u - the URL on which to base the returned URL
      newHost - the new host to use in the returned URL
      newPort - the new port to use in the returned URL
      Returns:
      a new URL identical to the specified URL, except using the specified host
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewPort

      public static URL getUrlWithNewPort(URL u, int newPort) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified port.
      Parameters:
      u - the URL on which to base the returned URL
      newPort - the new port to use in the returned URL or -1 to remove it
      Returns:
      a new URL identical to the specified URL, except using the specified port
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewPath

      public static URL getUrlWithNewPath(URL u, String newPath) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified path.
      Parameters:
      u - the URL on which to base the returned URL
      newPath - the new path to use in the returned URL
      Returns:
      a new URL identical to the specified URL, except using the specified path
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewRef

      public static URL getUrlWithNewRef(URL u, String newRef) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified reference.
      Parameters:
      u - the URL on which to base the returned URL
      newRef - the new reference to use in the returned URL or null to remove it
      Returns:
      a new URL identical to the specified URL, except using the specified reference
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewQuery

      public static URL getUrlWithNewQuery(URL u, String newQuery) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, except using the specified query string.
      Parameters:
      u - the URL on which to base the returned URL
      newQuery - the new query string to use in the returned URL
      Returns:
      a new URL identical to the specified URL, except using the specified query string
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithProtocolAndAuthority

      public static URL getUrlWithProtocolAndAuthority(URL u) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL, ignoring path, protocol and query.
      Parameters:
      u - the URL on which to base the returned URL
      Returns:
      a new URL identical to the specified URL, ignoring path, protocol and query
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewUserName

      public static URL getUrlWithNewUserName(URL u, String newUserName) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL but with a changed user name.
      Parameters:
      u - the URL on which to base the returned URL
      newUserName - the new user name or null to remove it
      Returns:
      a new URL identical to the specified URL; only user name updated
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • getUrlWithNewUserPassword

      public static URL getUrlWithNewUserPassword(URL u, String newUserPassword) throws MalformedURLException
      Creates and returns a new URL identical to the specified URL but with a changed user password.
      Parameters:
      u - the URL on which to base the returned URL
      newUserPassword - the new user password or null to remove it
      Returns:
      a new URL identical to the specified URL; only user name updated
      Throws:
      MalformedURLException - if there is a problem creating the new URL
    • resolveUrl

      public static String resolveUrl(String baseUrl, String relativeUrl)
      Resolves a given relative URL against a base URL. See RFC1808 Section 4 for more details.
      Parameters:
      baseUrl - The base URL in which to resolve the specification.
      relativeUrl - The relative URL to resolve against the base URL.
      Returns:
      the resolved specification.
    • resolveUrl

      public static String resolveUrl(URL baseUrl, String relativeUrl)
      Resolves a given relative URL against a base URL. See RFC1808 Section 4 for more details.
      Parameters:
      baseUrl - The base URL in which to resolve the specification.
      relativeUrl - The relative URL to resolve against the base URL.
      Returns:
      the resolved specification.
    • isValidScheme

      public static boolean isValidScheme(String scheme)
      Returns true if specified string is a valid scheme name.

      https://tools.ietf.org/html/rfc1738

      Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

      Parameters:
      scheme - the scheme string to check
      Returns:
      true if valid
    • isSpecialScheme

      public static boolean isSpecialScheme(String scheme)
      Returns true if specified string is a special scheme. see https://url.spec.whatwg.org/#special-scheme
      Parameters:
      scheme - the scheme string to check
      Returns:
      true if special
    • sameFile

      public static boolean sameFile(URL u1, URL u2)
      More or less the same as sameFile(URL, URL) but without resolving the host to an IP address for comparing. Additionally we do some path normalization.
      Parameters:
      u1 - a URL object
      u2 - a URL object
      Returns:
      true if u1 and u2 refer to the same file
    • normalize

      public static String normalize(URL url)
      Helper that constructs a normalized url string usable as cache key.
      Parameters:
      url - a URL object
      Returns:
      the normalized string
    • toURI

      public static URI toURI(URL url, String query) throws URISyntaxException
      Constructs a URI using the specified URL.
      Parameters:
      url - the URL
      query - the query
      Returns:
      the URI
      Throws:
      URISyntaxException - If both a scheme and a path are given but the path is relative, if the URI string constructed from the given components violates RFC 2396, or if the authority component of the string is present but cannot be parsed as a server-based authority
    • encodeQueryPart

      public static String encodeQueryPart(String part)
      Parameters:
      part - the part to encode
      Returns:
      the ecoded string
    • removeRedundantPort

      public static URL removeRedundantPort(URL url) throws MalformedURLException
      Removes the well known ports if it can be deduced from protocol.
      Parameters:
      url - the url to clean up
      Returns:
      a new URL without the port or the given one
      Throws:
      MalformedURLException - if the URL string cannot be converted to a URL instance
    • decodeDataUrl

      public static byte[] decodeDataUrl(byte[] bytes, boolean removeWhitespace) throws IllegalArgumentException
      Decodes an array of URL safe 7-bit characters into an array of original bytes. Escaped characters are converted back to their original representation.
      Parameters:
      bytes - array of URL safe characters
      removeWhitespace - if true don't add whitespace chars to the output
      Returns:
      array of original bytes
      Throws:
      IllegalArgumentException - in case of error
    • decodeUrl

      public static byte[] decodeUrl(byte[] bytes) throws IllegalArgumentException
      Decodes an array of URL safe 7-bit characters into an array of original bytes. Escaped characters are converted back to their original representation.
      Parameters:
      bytes - array of URL safe characters
      Returns:
      array of original bytes
      Throws:
      IllegalArgumentException - in case of error
    • encodeUrl

      public static byte[] encodeUrl(BitSet urlsafe, byte[] bytes)
      Encodes an array of bytes into an array of URL safe 7-bit characters. Unsafe characters are escaped.
      Parameters:
      urlsafe - bitset of characters deemed URL safe
      bytes - array of bytes to convert to URL safe characters
      Returns:
      array of bytes containing URL safe characters
    • isSameOrigin

      public static boolean isSameOrigin(URL originUrl, URL newUrl)
      Determines whether two URLs share the same origin according to the Same-Origin Policy. Two URLs are considered to have the same origin if they have the same protocol (scheme), host, and port.

      The method handles default ports correctly by using the URL's default port when the explicit port is -1 (indicating no port was specified).

      Parameters:
      originUrl - the first URL to compare (must not be null)
      newUrl - the second URL to compare (must not be null)
      Returns:
      true if both URLs have the same host and effective port; false otherwise