public class PercentEscaper extends UnicodeEscaper
UnicodeEscaper that escapes some set of Java characters using the URI percent encoding scheme. The set of safe characters (those which remain unescaped) can be specified on construction.
For details on escaping URIs for use in web pages, see RFC 3986 - section 2.4 and RFC 3986 - appendix A
When encoding a String, the following rules apply:
plusForSpace was specified, the space character " " is converted into a plus sign "+". RFC 2396 specifies the set of unreserved characters as "-", "_", ".", "!", "~", "*", "'", "(" and ")". It goes on to state:
Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.
For performance reasons the only currently supported character encoding of this class is UTF-8.
Note: This escaper produces uppercase hexadecimal sequences. From RFC 3986:
"URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."
| Modifier and Type | Field and Description |
|---|---|
static String |
SAFE_PLUS_RESERVED_CHARS_URLENCODER
Contains the save characters plus all reserved characters.
|
static String |
SAFECHARS_URLENCODER
A string of safe characters that mimics the behavior of
URLEncoder.
|
static String |
SAFEPATHCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI path segments, as specified in RFC 3986.
|
static String |
SAFEQUERYSTRINGCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI query strings, as specified in RFC 3986.
|
static String |
SAFEUSERINFOCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI user info part, as specified in RFC 3986.
|
| Constructor and Description |
|---|
PercentEscaper(String
Constructs a URI escaper with the specified safe characters and optional handling of the space character.
|
| Modifier and Type | Method and Description |
|---|---|
protected char[] |
escape(int cp)
Escapes the given Unicode code point in UTF-8.
|
String |
escape(String
Returns the escaped form of a given literal string.
|
protected int |
nextEscapeIndex(CharSequence
Scans a sub-sequence of characters from a given
CharSequence, returning the index of the next character that requires escaping.
|
codePointAt, escapeSlowpublic static final StringSAFECHARS_URLENCODER
URLEncoder.
public static final StringSAFEPATHCHARS_URLENCODER
public static final StringSAFE_PLUS_RESERVED_CHARS_URLENCODER
public static final StringSAFEUSERINFOCHARS_URLENCODER
public static final StringSAFEQUERYSTRINGCHARS_URLENCODER
public PercentEscaper(StringsafeChars, boolean plusForSpace)
safeChars - a non null string specifying additional safe characters for this escaper (the ranges 0..9, a..z and A..Z are always safe and should not be specified here)
plusForSpace - true if ASCII space should be escaped to
+ rather than
%20
IllegalArgumentException - if any of the parameters were invalid
protected int nextEscapeIndex(CharSequencecsq, int index, int end)
UnicodeEscaper
CharSequence, returning the index of the next character that requires escaping.
Note: When implementing an escaper, it is a good idea to override this method for efficiency. The base class implementation determines successive Unicode code points and invokes UnicodeEscaper for each of them. If the semantics of your escaper are such that code points in the supplementary range are either all escaped or all unescaped, this method can be implemented more efficiently using CharSequence.
Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.
See PercentEscaper for an example.
nextEscapeIndex in class
UnicodeEscaper
csq - a sequence of characters
index - the index of the first character to be scanned
end - the index immediately after the last character to be scanned
public Stringescape(String s)
UnicodeEscaper
If you are escaping input in arbitrary successive chunks, then it is not generally safe to use this method. If an input string ends with an unmatched high surrogate character, then this method will throw IllegalArgumentException. You should ensure your input is valid UTF-16 before calling this method.
escape in class
UnicodeEscaper
s - the literal string to be escaped
string
protected char[] escape(int cp)
escape in class
UnicodeEscaper
cp - the Unicode code point to escape if necessary
null if no escaping was needed