java如何实现python的urllib.quote(str,safe='/')
生活随笔
收集整理的這篇文章主要介紹了
java如何实现python的urllib.quote(str,safe='/')
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
最近需要將一些python代碼轉(zhuǎn)成java,遇到url編碼
urllib.quote(str,safe='/')
但java中URLEncoder.encode(arg, Constant.UTF_8)會(huì)將'/'轉(zhuǎn)成%2F
網(wǎng)上查了一下 java沒見到類似的safe方式,只好自己實(shí)現(xiàn)一個(gè)類
package com.ppc.spider.fc.util; import java.io.ByteArrayOutputStream; import java.io.BufferedWriter; import java.io.OutputStreamWriter; import java.io.IOException; import java.io.UnsupportedEncodingException; import java.net.URLDecoder; import java.io.CharArrayWriter; import java.nio.charset.Charset; import java.nio.charset.IllegalCharsetNameException; import java.nio.charset.UnsupportedCharsetException ; import java.util.BitSet; import java.security.AccessController; import sun.security.action.GetPropertyAction;public class UrlSafeEncoder {static BitSet dontNeedEncoding;static final int caseDiff = ('a' - 'A');static String dfltEncName = null;static {/* The list of characters that are not encoded has been* determined as follows:** RFC 2396 states:* -----* Data characters that are allowed in a URI but do not have a* reserved purpose are called unreserved. These include upper* and lower case letters, decimal digits, and a limited set of* punctuation marks and symbols.** unreserved = alphanum | mark** mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"** Unreserved characters can be escaped without changing the* semantics of the URI, but this should not be done unless the* URI is being used in a context that does not allow the* unescaped character to appear.* -----** It appears that both Netscape and Internet Explorer escape* all special characters from this list with the exception* of "-", "_", ".", "*". While it is not clear why they are* escaping the other characters, perhaps it is safest to* assume that there might be contexts in which the others* are unsafe if not escaped. Therefore, we will use the same* list. It is also noteworthy that this is consistent with* O'Reilly's "HTML: The Definitive Guide" (page 164).** As a last note, Intenet Explorer does not encode the "@"* character which is clearly not unreserved according to the* RFC. We are being consistent with the RFC in this matter,* as is Netscape.**/dontNeedEncoding = new BitSet(256);int i;for (i = 'a'; i <= 'z'; i++) {dontNeedEncoding.set(i);}for (i = 'A'; i <= 'Z'; i++) {dontNeedEncoding.set(i);}for (i = '0'; i <= '9'; i++) {dontNeedEncoding.set(i);}dontNeedEncoding.set(' '); /* encoding a space to a + is done* in the encode() method */dontNeedEncoding.set('-');dontNeedEncoding.set('_');dontNeedEncoding.set('.');dontNeedEncoding.set('*');dfltEncName = AccessController.doPrivileged(new GetPropertyAction("file.encoding"));}/*** You can't call the constructor.*/private UrlSafeEncoder() { }/*** Translates a string into {@code application/x-www-form-urlencoded}* format using a specific encoding scheme. This method uses the* supplied encoding scheme to obtain the bytes for unsafe* characters.* <p>* <em><strong>Note:</strong> The <a href=* "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">* World Wide Web Consortium Recommendation</a> states that* UTF-8 should be used. Not doing so may introduce* incompatibilities.</em>** @param s {@code String} to be translated.* @param enc The name of a supported* <a href="../lang/package-summary.html#charenc">character* encoding</a>.* @return the translated {@code String}.* @exception UnsupportedEncodingException* If the named encoding is not supported* @see URLDecoder#decode(java.lang.String, java.lang.String)* @since 1.4*/public static String encode(String s, String enc,char safe)throws UnsupportedEncodingException {dontNeedEncoding.set(safe);boolean needToChange = false;StringBuffer out = new StringBuffer(s.length());Charset charset;CharArrayWriter charArrayWriter = new CharArrayWriter();if (enc == null)throw new NullPointerException("charsetName");try {charset = Charset.forName(enc);} catch (IllegalCharsetNameException e) {throw new UnsupportedEncodingException(enc);} catch (UnsupportedCharsetException e) {throw new UnsupportedEncodingException(enc);}for (int i = 0; i < s.length();) {int c = (int) s.charAt(i);//System.out.println("Examining character: " + c);if (dontNeedEncoding.get(c)) {if (c == ' ') {c = '+';needToChange = true;}//System.out.println("Storing: " + c);out.append((char)c);i++;} else {// convert to external encoding before hex conversiondo {charArrayWriter.write(c);/** If this character represents the start of a Unicode* surrogate pair, then pass in two characters. It's not* clear what should be done if a bytes reserved in the* surrogate pairs range occurs outside of a legal* surrogate pair. For now, just treat it as if it were* any other character.*/if (c >= 0xD800 && c <= 0xDBFF) {/*System.out.println(Integer.toHexString(c)+ " is high surrogate");*/if ( (i+1) < s.length()) {int d = (int) s.charAt(i+1);/*System.out.println("\tExamining "+ Integer.toHexString(d));*/if (d >= 0xDC00 && d <= 0xDFFF) {/*System.out.println("\t"+ Integer.toHexString(d)+ " is low surrogate");*/charArrayWriter.write(d);i++;}}}i++;} while (i < s.length() && !dontNeedEncoding.get((c = (int) s.charAt(i))));charArrayWriter.flush();String str = new String(charArrayWriter.toCharArray());byte[] ba = str.getBytes(charset);for (int j = 0; j < ba.length; j++) {out.append('%');char ch = Character.forDigit((ba[j] >> 4) & 0xF, 16);// converting to use uppercase letter as part of// the hex value if ch is a letter.if (Character.isLetter(ch)) {ch -= caseDiff;}out.append(ch);ch = Character.forDigit(ba[j] & 0xF, 16);if (Character.isLetter(ch)) {ch -= caseDiff;}out.append(ch);}charArrayWriter.reset();needToChange = true;}}return (needToChange? out.toString() : s);} }驗(yàn)證下 基本ok
轉(zhuǎn)載于:https://www.cnblogs.com/davidwang456/p/9476384.html
總結(jié)
以上是生活随笔為你收集整理的java如何实现python的urllib.quote(str,safe='/')的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 利用solr的 DataImportHa
- 下一篇: neuroph轻量级神经网络框架