當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

solr 从数据库导入数据，全量索引和增量索引

發布時間：2023/12/14 数据库 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 solr 从数据库导入数据，全量索引和增量索引小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

首先說一下是從MySQL數據庫導入數據

這里使用的是mysql測試。

1、先在mysql中建一個表：solr_test

2、插入幾條測試數據：

3、用記事本打solrconfig.xml文件，在solrhome文件夾中。E:\solrhome\mycore\conf\solrconfig.xml

(solrhome文件夾是什么，參見：http://www.cnblogs.com/HD/p/3977799.html)

加入這個節點：

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"><lst name="defaults"><str name="config">data-config.xml</str></lst></requestHandler>

4、新建一個data-config.xml文件，與solrconfig.xml同一個目錄下。內容為

<dataConfig><dataSource type="JdbcDataSource"driver="com.mysql.jdbc.Driver"url="jdbc:mysql://localhost/test"user="root"password="root" /><document><entity name="solr_test" transformer="DateFormatTransformer"query="SELECT id, subject, content, last_update_time FROM solr_test WHERE id >= ${dataimporter.request.id}"><field column='last_update_time' dateTimeFormat='yyyy-MM-dd HH:mm:ss' /></entity></document> </dataConfig>

說明：這里使用了一個${dataimporter.request.id}，這個是參數，后面在做數據導入時，會使用到，以此條件為基準讀數據。

5、復制解壓出的solr jar包solr-dataimporthandler-4.10.0.jar和solr-dataimporthandler-extras-4.10.0.jar到tomcat solr webapp的WEB-INF\lib目錄下。

當然，也包括mysql的jdbc jar包：mysql-connector-Java-5.1.7-bin.jar

（還有一種方法是在solrconfig.xml中加入lib節點，然后把jar包放到solrhome下，這樣可以不在WEB-INF\lib中加入jar包）

6、用記事本打開schema.xml，在在solrhome文件夾中（同第3點）。內容為：

<?xml version="1.0" ?> <schema name="my core" version="1.1"><fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/><fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/><fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/><fieldType name="text_cn" class="solr.TextField"><analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer" /><analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer" /></fieldType><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="subject" type="text_cn" indexed="true" stored="true" /><field name="content" type="text_cn" indexed="true" stored="true" /><field name="last_update_time" type="date" indexed="true" stored="true" /><field name="_version_" type="long" indexed="true" stored="true"/><uniqueKey>id</uniqueKey><defaultSearchField>subject</defaultSearchField><solrQueryParser defaultOperator="OR"/> </schema>

7、打開solr web:

說明：

Custom Parameters填入id=1，這是在第4點中設置的參數。

Clean選項，是指是否刪除未匹配到的數據。也就是在數據庫select結果中沒有，而solr索引庫中存在，則刪除。

也可以使用這個地址直接訪問：

http://localhost:8899/solr/mycore/dataimport?command=full-import&clean=true&commit=true&wt=json&indent=true&entity=solr_test&verbose=false&optimize=false&debug=false&id=1

將返回結果：

配置好后，之后我們只需要使用這個url地址，就可以不段的去導入數據做索引了。（就這么簡單）

8、測試查詢：

當然，dataimport可以加入參數命令，讓其重新加載data-config.xml

http://localhost:8899/solr/#/mycore/dataimport/command=reload-config

下面全量索引和增量索引的配置區別，注意和上面不是同一個工程（首先全量索引會把數據庫中所有數據進行索引的更新，增量索引只更新數據庫中增刪改查過的）要使用增量索引，數據庫中要有一個標識字段來表示數據的變化，我們可以使用時間戳來表示，數據更新時時間戳也更新，這樣，solr通過比較時間戳的變化來增量更新索引。

1.修改multicore/new_core/conf/solrconfig.xml文件（上篇提到過的），在里面新增

Xml代碼??

<requestHandler?name="/dataimport"?class="org.apache.solr.handler.dataimport.DataImportHandler">??

?????<lst?name="defaults">??

?????<str?name="config">data-config.xml</str>??

?????</lst>???

</requestHandler>???

<requestHandler?name="/deltaimport"?class="org.apache.solr.handler.dataimport.DataImportHandler">??

????<lst?name="defaults">??

????????<str?name="config">delta-data-config.xml</str>??

????</lst>??

</requestHandler>??

?其中第一段是專門做全量索引的，第二段做增量索引（主要是靠DataImportHandler類實現）

2.新增multicore/new_core/conf/data-config.xml文件

Xml代碼??

<dataConfig>??

????<dataSource?name="jdbc"?driver="com.mysql.jdbc.Driver"??

????????url="jdbc:mysql://192.168.0.81:3306/new_mall?zeroDateTimeBehavior=convertToNull&characterEncoding=utf8&useUnicode=true"??

????????user="root"?password="HyS_Db@2014"/>??

????<document?name="mall_goods">??

????????<entity?name="MallGoods"?pk="id"??

????????????????query="select?*?from?mall_goods?limit?${dataimporter.request.length}?offset?${dataimporter.request.offset}"??

????????????????transformer="RegexTransformer">??

????????????<field?column="goods_id"?name="id"?/>??

????????????<field?column="title"?name="title"?/>??

????????????<field?column="subtitle"?name="subtitle"?/>??

????????????<field?column="cover_img_path"?name="coverImgPath"?/>??

????????????<field?column="description"?name="description"?/>??

????????????<field?column="update_date"?name="updateDate"?/>??

????????</entity>??

????</document>??

</dataConfig>??

dataSource不用說了，數據源配置來的?

entity文檔中的實體配置（注意pk="id" 不能隨便改，需要和schema.xml中的<uniqueKey>id</uniqueKey>匹配，否則會報“?org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id”）

query 查詢語句（可分頁）

transformer 暫時不清楚干啥

field定義列名

3.新增multicore/new_core/conf/delta-data-config.xml文件

Xml代碼??

<dataConfig>??

????<dataSource?name="jdbc"?driver="com.mysql.jdbc.Driver"??

????????url="jdbc:mysql://192.168.0.81:3306/new_mall?zeroDateTimeBehavior=convertToNull&characterEncoding=utf8&useUnicode=true"??

????????user="root"?password="HyS_Db@2014"/>??

????<document?name="mall_goods">??

????????<entity?name="MallGoods"?pk="id"??

????????????????query="select?*?from?mall_goods"??

????????????????deltaImportQuery="select?*?from?mall_goods?where?goods_id='${dih.delta.id}'"??

????????????????deltaQuery="select?goods_id?as?id?from?mall_goods?where?update_date?>?'${dih.last_index_time}'"??

????????????????transformer="RegexTransformer">??

????????????<field?column="goods_id"?name="id"?/>??

????????????<field?column="title"?name="title"?/>??

????????????<field?column="subtitle"?name="subtitle"?/>??

????????????<field?column="cover_img_path"?name="coverImgPath"?/>??

????????????<field?column="description"?name="description"?/>??

????????????<field?column="update_date"?name="updateDate"?/>??

????????</entity>??

????</document>??

</dataConfig>??

deltaQuery查詢出有更改過的id

deltaImportQuery根據id查詢?

4.修改multicore/new_core/conf/schema.xml文件，定義field索引配置

Xml代碼??

<field?name="id"?type="string"?indexed="true"?stored="true"?required="true"?multiValued="false"?/>???

<field?name="title"?type="text_ansj"?indexed="true"?stored="true"?required="true"?multiValued="false"/>??

<field?name="subtitle"?type="text_ansj"?indexed="true"?stored="true"?required="false"?multiValued="false"/>??

<field?name="coverImgPath"?type="string"?indexed="false"?stored="true"?required="true"?multiValued="false"?/>??

<field?name="description"?type="text_ansj"?indexed="true"?stored="true"?required="false"?multiValued="false"/>??

<field?name="updateDate"?type="text_ansj"?indexed="true"?stored="true"?required="false"?multiValued="false"/>??

??????

?注意上面選擇一下text_ansj

5.solr的war包可能還缺少部分jar包，需要把mysql的jar，以及solr項目中dist目錄下的jar包都放到solr的web站點中

6.開始運行

全量：http://solr.xxxx.com:8082/new_core/dataimport?command=full-import&commit=true&clean=false&offset=0&length=100000（其中0到100000的數據建立索引）

增量：http://solr.ehaoyao.com:8082/new_core/deltaimport?command=delta-import&entity=MallGoods?

entity:是document下面的標簽（data-config.xml）。使用這個參數可以有選擇的執行一個或多個entity ? 。使用多個entity參數可以使得多個entity同時運行。如果不選擇此參數那么所有的都會被運行。

clean:選擇是否要在索引開始構建之前刪除之前的索引，默認為true

commit:選擇是否在索引完成之后提交。默認為true

optimize:是否在索引完成之后對索引進行優化。默認為true

debug:是否以調試模式運行，適用于交互式開發（interactive development mode）之中。

請注意，如果以調試模式運行，那么默認不會自動提交，請加參數“commit=true”?

注意：在做增量索引的時候

很容易出現deltaQuery has no column to resolve to declared primary key pk='id'這種異常

主要是因為ID" must be used as it is in 'deltaQuery' select statement as "select ID from ..."

(if you different name for ID column in database, then use 'as' keyword in select statement. In my case I had 'studentID' as primary key in student table. So I used it as "select studentID as ID from ..."

--> The same applies to 'deletedPkQuery'

At present its working fine for me. Any updation in database is reflected in Solr as well.

所以，delta-data-config.xml文件需要注意一下pk的值

參考連接：

http://shiyanjun.cn/archives/444.html

http://blog.duteba.com/technology/article/70.htm

http://www.devnote.cn/article/89.html

http://qiaqia26.iteye.com/blog/1004996

http://zzstudy.offcn.com/archives/8104

http://blog.csdn.NET/duck_genuine/article/details/5426897?

------------------------------------------------------------------------------------------------------------------------------

最后補充：

有時候需要刪除索引數據，可以這樣刪除

http://xxxx/new_core/update/?stream.body=<delete><query>*:*</query></delete>&stream.contentType=text/xml;charset=utf-8&commit=true

new_core 表示你要刪除哪個核下面的索引

java代碼調用增量和全量索引

[java]?view plaincopy

import?org.apache.log4j.Logger;??

import?org.apache.solr.client.solrj.SolrQuery;??

import?org.apache.solr.client.solrj.SolrServerException;??

import?org.apache.solr.client.solrj.impl.HttpSolrServer;??

import?base.util.ConfigUtil;??

public?class?SolrService?{??

????private?static?Logger?log?=?Logger.getLogger(SolrService.class);??

????private?static?HttpSolrServer?solrServer;??

????static?{??

????????solrServer?=?new?HttpSolrServer(ConfigUtil.getValue("solr.url"));??

????????solrServer.setConnectionTimeout(5000);??

????}??

????/**?

?????*?增量/全量建立索引?。?

?????*??

?????*?@param?delta?ture，增量建立索引；false，重建所有索引?

?????*/??

????public?static?void?buildIndex(boolean?delta)?{??

????????SolrQuery?query?=?new?SolrQuery();??

????????//?指定RequestHandler，默認使用/select??

????????query.setRequestHandler("/dataimport");??

????????String?command?=?delta???"delta-import"?:?"full-import";??

????????String?clean?=?delta???"false"?:?"true";??

????????String?optimize?=?delta???"false"?:?"true";??

??????????

????????query.setParam("command",?command)??

?????????????.setParam("clean",?clean)??

?????????????.setParam("commit",?"true")??

?????????????.setParam("entity",?"article")??

?????????????.setParam("optimize",?optimize);??

????????try?{??

????????????solrServer.query(query);??

????????}?catch?(SolrServerException?e)?{??

????????????log.error("建立索引時遇到錯誤，delta:"?+?delta,?e);??

????????}??

????}??

??????

}??

相關說明：

主要原理：是利用率每次我們進行import的時候在solr.home的conf下面生成的dataimport.properties文件，此文件里面有最近一次導入的相關信息，如：
我的文件位置為

/root/solr-4.5.1/example/solr/collection1/conf

我的文件內容為

#Mon Dec 09 14:06:03 CST 2013 last_index_time=2013-12-09 14\:06\:00 article.last_index_time=2013-12-09 14\:06\:00

last_index_time是最近一次增量或全量索引的時間，通過比較這個時間和我們數據庫表中的update_time列即可得出哪些是之后修改或者添加的。

data-config.xml說明：

query是獲取全部數據的SQL

deltaImportQuery是獲取增量數據時使用的SQL

deltaQuery是獲取主鍵的SQL

參數說明：

clean：設置建索引前是否刪除之前的索引；

commit：設置建索引后是否自動提交；

entity：mysql-data-config.xml entity name中配置的名稱，如果配有多個，且這里不指定，所有entity都會被執行；

optimize：設置建索引后是否自動優化。

總結

以上是生活随笔為你收集整理的solr 从数据库导入数据，全量索引和增量索引的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：微信群管理工具哪个好？最安全的微信群管理
下一篇：如何修炼java内功

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

数据库

solr 从数据库导入数据，全量索引和增量索引

總結