生活随笔
收集整理的這篇文章主要介紹了
过滤敏感词
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1,什么是前綴樹
- 根節點沒有數據,每個節點只有一個數據
- 每個分枝都是一個敏感詞
- 每個節點所包含的子節點不相同
例圖:
2、前綴樹過濾敏感詞的思路
首先我們需要三個指針,指針1指向根節點,指針2和指針3指向我們要過濾的句子的第一個字符,開始我們先看指針2和指針3指向的第一個字符是否是前綴樹根節點的子節點,如果是,執行步驟2,如果不是執行步驟5指針3往后移動看下一個字符是否在前綴樹中如果在,執行步驟3,如果不在執行步驟4看這個字符是否是敏感詞的最后一個字,如果是替換指針2和指針3范圍內的字符,指針3向后移動一位,然后讓指針2和指針3指向同一個字符,如果不是繼續執行步驟2指針2向后移動一位,然后讓指針3和指針2指向同一個字符 ,然后執行步驟1指針2和指針3同時向后移動,執行步驟1
例圖:
-
3、代碼實現
定義前綴樹(我們創建一個工具類來實現過濾敏感詞的功能,前綴樹被我定義在工具類的內部)
@Component
public class SensitiveFilter {private class TrieNode {private boolean isKeywordEnd
= false;private Map<Character, TrieNode> subNodes
= new HashMap<>();public boolean isKeywordEnd() {return isKeywordEnd
;}public void setKeywordEnd(boolean keywordEnd
) {isKeywordEnd
= keywordEnd
;}public void addSubNode(Character c
, TrieNode node
) {subNodes
.put(c
, node
);}public TrieNode getSubNode(Character c
) {return subNodes
.get(c
);}}}
根據敏感詞,初始化前綴樹
@Component
public class SensitiveFilter {private static final Logger logger
= LoggerFactory.getLogger(SensitiveFilter.class);private static final String REPLACEMENT
= "***";private TrieNode rootNode
= new TrieNode();@PostConstructpublic void init() {try (InputStream is
= this.getClass().getClassLoader().getResourceAsStream("sensitive-words.txt");BufferedReader reader
= new BufferedReader(new InputStreamReader(is
));) {String keyword
;while ((keyword
= reader
.readLine()) != null) {this.addKeyword(keyword
);}} catch (IOException e
) {logger
.error("加載敏感詞文件失敗: " + e
.getMessage());}}private void addKeyword(String keyword
) {TrieNode tempNode
= rootNode
;for (int i
= 0; i
< keyword
.length(); i
++) {char c
= keyword
.charAt(i
);TrieNode subNode
= tempNode
.getSubNode(c
);if (subNode
== null) {subNode
= new TrieNode();tempNode
.addSubNode(c
, subNode
);}tempNode
= subNode
;if (i
== keyword
.length() - 1) {tempNode
.setKeywordEnd(true);}}}private class TrieNode {private boolean isKeywordEnd
= false;private Map<Character, TrieNode> subNodes
= new HashMap<>();public boolean isKeywordEnd() {return isKeywordEnd
;}public void setKeywordEnd(boolean keywordEnd
) {isKeywordEnd
= keywordEnd
;}public void addSubNode(Character c
, TrieNode node
) {subNodes
.put(c
, node
);}public TrieNode getSubNode(Character c
) {return subNodes
.get(c
);}}}
編寫過濾敏感詞的方法
@Component
public class SensitiveFilter {private static final Logger logger
= LoggerFactory.getLogger(SensitiveFilter.class);private static final String REPLACEMENT
= "***";private TrieNode rootNode
= new TrieNode();@PostConstructpublic void init() {try (InputStream is
= this.getClass().getClassLoader().getResourceAsStream("sensitive-words.txt");BufferedReader reader
= new BufferedReader(new InputStreamReader(is
));) {String keyword
;while ((keyword
= reader
.readLine()) != null) {this.addKeyword(keyword
);}} catch (IOException e
) {logger
.error("加載敏感詞文件失敗: " + e
.getMessage());}}private void addKeyword(String keyword
) {TrieNode tempNode
= rootNode
;for (int i
= 0; i
< keyword
.length(); i
++) {char c
= keyword
.charAt(i
);TrieNode subNode
= tempNode
.getSubNode(c
);if (subNode
== null) {subNode
= new TrieNode();tempNode
.addSubNode(c
, subNode
);}tempNode
= subNode
;if (i
== keyword
.length() - 1) {tempNode
.setKeywordEnd(true);}}}public String filter(String text
) {if (StringUtils.isBlank(text
)) {return null;}TrieNode tempNode
= rootNode
;int begin
= 0;int position
= 0;StringBuilder sb
= new StringBuilder();while (position
< text
.length()) {char c
= text
.charAt(position
);if (isSymbol(c
)) {if (tempNode
== rootNode
) {sb
.append(c
);begin
++;}position
++;continue;}tempNode
= tempNode
.getSubNode(c
);if (tempNode
== null) {sb
.append(text
.charAt(begin
));position
= ++begin
;tempNode
= rootNode
;} else if (tempNode
.isKeywordEnd()) {sb
.append(REPLACEMENT
);begin
= ++position
;tempNode
= rootNode
;} else {position
++;}}sb
.append(text
.substring(begin
));return sb
.toString();}private boolean isSymbol(Character c
) {return !CharUtils.isAsciiAlphanumeric(c
) && (c
< 0x2E80 || c
> 0x9FFF);}private class TrieNode {private boolean isKeywordEnd
= false;private Map<Character, TrieNode> subNodes
= new HashMap<>();public boolean isKeywordEnd() {return isKeywordEnd
;}public void setKeywordEnd(boolean keywordEnd
) {isKeywordEnd
= keywordEnd
;}public void addSubNode(Character c
, TrieNode node
) {subNodes
.put(c
, node
);}public TrieNode getSubNode(Character c
) {return subNodes
.get(c
);}}}
總結
以上是生活随笔為你收集整理的过滤敏感词的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。