java.lang.Object
org.htmlunit.util.EncodingSniffer
Sniffs encoding settings from HTML, XML or other content. The HTML encoding sniffing algorithm is based on the
HTML5
encoding sniffing algorithm.
- Author:
- Daniel Gredler, Ahmed Ashour, Ronald Brill, Lai Quang Duong
-
Method Summary
Modifier and TypeMethodDescriptionstatic CharsetExtracts an encoding from the specifiedContent-Typevalue using the IETF algorithm; if no encoding is found, this method returnsnull.static CharsetParses and returns the charset declaration at the start of a css file if any, otherwise returnsnull.static CharsetAttempts to sniff an encoding from an HTMLmetatag in the specified byte array.static CharsetSearches the specified XML content for an XML declaration and returns the encoding if found, otherwise returnsnull.static CharsetReturnsCharsetif the specified charset name is supported on this platform.static StringtranslateEncodingLabel(String encodingLabel) Translates the given encoding label into a normalized form according to Reference.
-
Method Details
-
sniffEncodingFromMetaTag
Attempts to sniff an encoding from an HTMLmetatag in the specified byte array.- Parameters:
is- the content stream to check for an HTMLmetatag- Returns:
- the encoding sniffed from the specified bytes, or
nullif the encoding could not be determined - Throws:
IOException- if an IO error occurs
-
extractEncodingFromContentType
Extracts an encoding from the specifiedContent-Typevalue using the IETF algorithm; if no encoding is found, this method returnsnull.- Parameters:
s- theContent-Typevalue to search for an encoding- Returns:
- the encoding found in the specified
Content-Typevalue, ornullif no encoding was found
-
sniffEncodingFromXmlDeclaration
Searches the specified XML content for an XML declaration and returns the encoding if found, otherwise returnsnull.- Parameters:
is- the content stream to check for the charset declaration- Returns:
- the encoding of the specified XML content, or
nullif it could not be determined - Throws:
IOException- if an IO error occurs
-
sniffEncodingFromCssDeclaration
Parses and returns the charset declaration at the start of a css file if any, otherwise returnsnull.e.g.
@charset "UTF-8"
- Parameters:
is- the input stream to parse- Returns:
- the charset declaration at the start of a css file if any, otherwise returns
null. - Throws:
IOException- if an IO error occurs
-
toCharset
ReturnsCharsetif the specified charset name is supported on this platform.- Parameters:
charsetName- the charset name to check- Returns:
Charsetif the specified charset name is supported on this platform
-
translateEncodingLabel
Translates the given encoding label into a normalized form according to Reference.- Parameters:
encodingLabel- the label to translate- Returns:
- the normalized encoding name or null if not found
-