白话Elasticsearch30-IK中文分词之热更新IK词库
文章目錄
- 概述
- 熱更新方案
- IK Github 下載Source Code
- 導入maven工程
- 修改源碼
- Dictionary#initial方法中開啟掃描線程
- HotDictReloadThread
- 配置文件 jdbc-reload.properties
- Dictionary#iloadMainDict 自定義從mysql加載主詞典
- Dictionary#loadStopWordDict自定義從mysql加載停止詞詞典
- 編譯
- 將zip解壓到 es ik插件目錄下
- 添加mysql依賴包
- mysql建表語句
- 重啟ES
- 驗證熱加載
- 熱加載主詞典
- 熱加載停用詞詞典
- 遇到的問題 及解決辦法
- 問題:java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "setContextClassLoader")
- 解決辦法
- 編譯后的資源
概述
繼續跟中華石杉老師學習ES,第30篇
課程地址: https://www.roncoo.com/view/55
白話Elasticsearch28-IK中文分詞器的安裝和使用
白話Elasticsearch29-IK分詞器配置文件+自定義詞庫
上面兩篇學習了如何安裝IK以及基本的使用,當我們使用自定義詞庫的時候,是不是每次都得重啟,而且得逐個節點依次修改,是不是有點不方便呢?
主要缺點:
- 每次添加完,都要重啟es才能生效,非常麻煩
- es是分布式的,如果有數百個節點…
熱更新方案
常用的有兩種方式
- 修改ik分詞器源碼,然后手動支持從mysql中每隔一定時間,自動加載新的詞庫
- 基于ik分詞器原生支持的熱更新方案,部署一個web服務器,提供一個http接口,通過modified和tag兩個http響應頭,來提供詞語的熱更新
推薦第一種方案修改ik分詞器源碼, 第二種方案ik git社區官方都不建議采用,不太穩定。
既然說到了要修改源碼,那接著來吧,到ik的GitHub上下載源碼
IK Github 下載Source Code
https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.4.1
找到對應ES版本的IK,下載源碼 ,這里我是用的是6.4.1版本的ES 。
導入maven工程
導入maven,這里就不細說了,很簡單。 導入完成后,一個標準的maven工程就呈現在你的面前了。
修改源碼
簡單說下整體思路: 開啟一個后臺線程,掃描mysql中定義的表,加載數據。
Dictionary#initial方法中開啟掃描線程
// Step1.開啟新的線程重新加載詞典 new Thread(new HotDictReloadThread()).start();HotDictReloadThread
死循環,調用Dictionary.getSingleton().reLoadMainDict(),重新加載詞典
package org.wltea.analyzer.dic;import org.apache.logging.log4j.Logger; import org.elasticsearch.common.logging.ESLoggerFactory;public class HotDictReloadThread implements Runnable {private static final Logger logger = ESLoggerFactory.getLogger(HotDictReloadThread.class.getName());@Overridepublic void run() {while(true) {logger.info("[==========]reload hot dict from mysql......"); Dictionary.getSingleton().reLoadMainDict();}}}那看下 reLoadMainDict 干了啥吧
兩件事兒,加載主詞庫 和 停用詞詞庫 ,那我們就把自定義的mysql部分分別放到這兩個方法里就OK了。
配置文件 jdbc-reload.properties
配置文件 jdbc-reload.properties
jdbc-reload.properties
jdbc.url=jdbc:mysql://localhost:3306/ik?serverTimezone=GMT jdbc.user=root jdbc.password=root jdbc.reload.sql=select word from hot_words jdbc.reload.stopword.sql=select stopword as word from hot_stopwords jdbc.reload.interval=1000reload間隔,1秒輪訓一次 。
Dictionary#iloadMainDict 自定義從mysql加載主詞典
// Step2 從mysql加載詞典this.loadMySQLExtDict();加載自定義的db配置文件,通過JDBC查詢mysql ,就是這么簡單
private static Properties prop = new Properties();static {try {//Class.forName("com.mysql.jdbc.Driver");Class.forName("com.mysql.cj.jdbc.Driver");} catch (ClassNotFoundException e) {logger.error("error", e);}}/*** 從mysql加載熱更新詞典*/private void loadMySQLExtDict() {Connection conn = null;Statement stmt = null;ResultSet rs = null;try {Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties"); prop.load(new FileInputStream(file.toFile()));logger.info("[==========]jdbc-reload.properties");for(Object key : prop.keySet()) {logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key))); }logger.info("[==========]query hot dict from mysql, " + prop.getProperty("jdbc.reload.sql") + "......"); conn = DriverManager.getConnection(prop.getProperty("jdbc.url"), prop.getProperty("jdbc.user"), prop.getProperty("jdbc.password")); stmt = conn.createStatement();rs = stmt.executeQuery(prop.getProperty("jdbc.reload.sql")); while(rs.next()) {String theWord = rs.getString("word"); logger.info("[==========]hot word from mysql: " + theWord); _MainDict.fillSegment(theWord.trim().toCharArray());}Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval")))); } catch (Exception e) {logger.error("erorr", e); } finally {if(rs != null) {try {rs.close();} catch (SQLException e) {logger.error("error", e); }}if(stmt != null) {try {stmt.close();} catch (SQLException e) {logger.error("error", e); }}if(conn != null) {try {conn.close();} catch (SQLException e) {logger.error("error", e); }}}}Dictionary#loadStopWordDict自定義從mysql加載停止詞詞典
// Step3 從mysql加載停用詞 this.loadMySQLStopwordDict(); /*** 從mysql加載停用詞*/private void loadMySQLStopwordDict() {Connection conn = null;Statement stmt = null;ResultSet rs = null;try {Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties"); prop.load(new FileInputStream(file.toFile()));logger.info("[==========]jdbc-reload.properties");for(Object key : prop.keySet()) {logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key))); }logger.info("[==========]query hot stopword dict from mysql, " + prop.getProperty("jdbc.reload.stopword.sql") + "......"); conn = DriverManager.getConnection(prop.getProperty("jdbc.url"), prop.getProperty("jdbc.user"), prop.getProperty("jdbc.password")); stmt = conn.createStatement();rs = stmt.executeQuery(prop.getProperty("jdbc.reload.stopword.sql")); while(rs.next()) {String theWord = rs.getString("word"); logger.info("[==========]hot stopword from mysql: " + theWord); _StopWords.fillSegment(theWord.trim().toCharArray());}Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval")))); } catch (Exception e) {logger.error("erorr", e); } finally {if(rs != null) {try {rs.close();} catch (SQLException e) {logger.error("error", e); }}if(stmt != null) {try {stmt.close();} catch (SQLException e) {logger.error("error", e); }}if(conn != null) {try {conn.close();} catch (SQLException e) {logger.error("error", e); }}}}編譯
項目右鍵–Run As --Maven Build —> clean package
編譯成功后,去獲取zip文件
將zip解壓到 es ik插件目錄下
添加mysql依賴包
我本地的mysql是 8.0.11版本的
放到ik目錄下
mysql建表語句
/* Navicat MySQL Data TransferSource Server : localhost_root Source Server Version : 80011 Source Host : localhost:3306 Source Database : ikTarget Server Type : MYSQL Target Server Version : 80011 File Encoding : 65001Date: 2019-08-20 23:35:18 */SET FOREIGN_KEY_CHECKS=0;-- ---------------------------- -- Table structure for `hot_stopwords` -- ---------------------------- DROP TABLE IF EXISTS `hot_stopwords`; CREATE TABLE `hot_stopwords` (`id` int(11) NOT NULL AUTO_INCREMENT,`stopword` longtext COLLATE utf8mb4_general_ci,PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;-- ---------------------------- -- Records of hot_stopwords -- ------------------------------ ---------------------------- -- Table structure for `hot_words` -- ---------------------------- DROP TABLE IF EXISTS `hot_words`; CREATE TABLE `hot_words` (`id` int(11) NOT NULL AUTO_INCREMENT,`word` longtext COLLATE utf8mb4_general_ci,PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;重啟ES
啟動日志
成功
驗證熱加載
熱加載主詞典
我們先看下IK默認的配置文件 ,我們并沒有修改過。
使用 ik_max_word 來看下 IK的 對 “盤他”的分詞
插入一條數據
INSERT INTO `hot_words` VALUES ('1', '盤他');查看es elasticsearch.log的日志
可以看到加載成功,那重新來查看下分詞
不會被IK分詞了,成功。
熱加載停用詞詞典
我們把“啥”作為停用詞,添加到mysql的停用詞表中
INSERT INTO `hot_stopwords` VALUES ('1', '啥'); 查看es elasticsearch.log日志
重新執行分詞測試
可以看到“啥”已經不會被IK當做分詞了,成功。
遇到的問題 及解決辦法
問題:java.security.AccessControlException: access denied (“java.lang.RuntimePermission” “setContextClassLoader”)
[2019-08-20T22:32:43,444][INFO ][o.e.n.Node ] [aQ19O09] starting ... [2019-08-20T22:32:46,133][INFO ][o.e.t.TransportService ] [aQ19O09] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300} [2019-08-20T22:32:49,435][INFO ][o.e.c.s.MasterService ] [aQ19O09] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {aQ19O09}{aQ19O095TZmH9VHKNHC1qw}{PjHRPar4TV2JQ-iy-bWIoA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=10614976512, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} [2019-08-20T22:32:49,442][INFO ][o.e.c.s.ClusterApplierService] [aQ19O09] new_master {aQ19O09}{aQ19O095TZmH9VHKNHC1qw}{PjHRPar4TV2JQ-iy-bWIoA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=10614976512, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {aQ19O09}{aQ19O095TZmH9VHKNHC1qw}{PjHRPar4TV2JQ-iy-bWIoA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=10614976512, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]]) [2019-08-20T22:32:49,685][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [elasticsearch[aQ19O09][generic][T#4]], exiting java.lang.ExceptionInInitializerError: nullat java.lang.Class.forName0(Native Method) ~[?:1.8.0_161]at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_161]at com.mysql.cj.jdbc.NonRegisteringDriver.<clinit>(NonRegisteringDriver.java:106) ~[?:?]at java.lang.Class.forName0(Native Method) ~[?:1.8.0_161]at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_161]at org.wltea.analyzer.dic.Dictionary.<clinit>(Dictionary.java:117) ~[?:?]at org.wltea.analyzer.cfg.Configuration.<init>(Configuration.java:40) ~[?:?]at org.elasticsearch.index.analysis.IkTokenizerFactory.<init>(IkTokenizerFactory.java:15) ~[?:?]at org.elasticsearch.index.analysis.IkTokenizerFactory.getIkSmartTokenizerFactory(IkTokenizerFactory.java:23) ~[?:?]at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:377) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:191) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:158) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.index.IndexService.<init>(IndexService.java:162) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:383) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:475) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:547) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.gateway.Gateway.performStateRecovery(Gateway.java:127) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.gateway.GatewayService$1.doRun(GatewayService.java:223) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[elasticsearch-6.4.1.jar:6.4.1]at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.1.jar:6.4.1]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161] Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "setContextClassLoader")at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_161]at java.security.AccessController.checkPermission(AccessController.java:884) ~[?:1.8.0_161]at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[?:1.8.0_161]at java.lang.Thread.setContextClassLoader(Thread.java:1474) ~[?:1.8.0_161]at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread$1.newThread(AbandonedConnectionCleanupThread.java:56) ~[?:?]at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619) ~[?:1.8.0_161]at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) ~[?:1.8.0_161]at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_161]at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_161]at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.<clinit>(AbandonedConnectionCleanupThread.java:60) ~[?:?]... 23 more解決辦法
Java 安全權限導致的異常。
找到ES使用的JDK,這里我使用的是 1.8.0_161
java version "1.8.0_161" Java(TM) SE Runtime Environment (build 1.8.0_161-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)找到安裝目錄–>進入 jre\lib\security 目錄
比如我本地的 E:\Program Files\Java\jdk1.8.0_161\jre\lib\security ,找到 java.policy ,在 grant最后一行加入 permission java.security.AllPermission; ,然后重啟ES ,即可解決
編譯后的資源
如果你的覺的麻煩,可以用我編譯好的zip包 ,戳這里
總結
以上是生活随笔為你收集整理的白话Elasticsearch30-IK中文分词之热更新IK词库的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 白话Elasticsearch29-IK
- 下一篇: 白话Elasticsearch31-深入