當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hadoop中汉字与英文字符混合的keyword做为combine的key的问题

發布時間：2025/6/15 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 hadoop中汉字与英文字符混合的keyword做为combine的key的问题小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

近期，須要將漢字與字符的非常合串作為combine的輸出的key，

這樣做是希望，利用hadoop的歸并來依照key進行分組，然后，在reduce階段，拿到的都是一個一個組。

可是，發現，這樣的，漢字混合的傳做key，居然，在reduce階段中，接受的的key并非唯一的，于是，考慮利用轉碼來實現。

終于，問題攻克了。

package test.com.gjob.services; import java.util.Properties; public class Test { public static void main(String[] args) { String s = "簡單介紹"; String tt = gbEncoding(s); // String tt1 = "你好，我想給你說一個事情"; System.out.println(decodeUnicode("\\u7b80\\u4ecb")); // System.out.println(decodeUnicode(tt1)); System.out.println(HTMLDecoder.decode("中國")); String s1 = "\u7b80\u4ecb"; System.out.println(s.indexOf("\\")); } public static String gbEncoding(final String gbString) { char[] utfBytes = gbString.toCharArray(); String unicodeBytes = ""; for (int byteIndex = 0; byteIndex < utfBytes.length; byteIndex++) { String hexB = Integer.toHexString(utfBytes[byteIndex]); if (hexB.length() <= 2) { hexB = "00" + hexB; } unicodeBytes = unicodeBytes + "\\u" + hexB; } System.out.println("unicodeBytes is: " + unicodeBytes); return unicodeBytes; } public static String decodeUnicode(final String dataStr) { int start = 0; int end = 0; final StringBuffer buffer = new StringBuffer(); while (start > -1) { end = dataStr.indexOf("\\u", start + 2); String charStr = ""; if (end == -1) { charStr = dataStr.substring(start + 2, dataStr.length()); } else { charStr = dataStr.substring(start + 2, end); } char letter = (char) Integer.parseInt(charStr, 16); // 16進制parse整形字符串。 buffer.append(new Character(letter).toString()); start = end; } return buffer.toString(); } }

總結

以上是生活随笔為你收集整理的hadoop中汉字与英文字符混合的keyword做为combine的key的问题的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C# 实现DB文件的导入导出功能
下一篇： axis2常用命令（wsdl2java）

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

hadoop中汉字与英文字符混合的keyword做为combine的key的问题

總結