當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hbase二级索引+CDH+Lily

發(fā)布時間：2024/8/23 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hbase二级索引+CDH+Lily 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1.更改表結構，允許復制

已存在的表 disable 'tableName' alter 'tableName',{NAME =>'fn', REPLICATION_SCOPE =>1} enable 'tableName' 不存在的表 create ‘table‘,{NAME =>‘cf‘, REPLICATION_SCOPE =>1} #其中1表示開啟replication功能，0表示不開啟，默認為0

2.創(chuàng)建相應的SolrCloud集合
接下來在安裝有Solr的機器上運行
這里得路徑和用戶名都可以自己定義

# 生成實體配置文件：solrctl instancedir --generate /opt/hbase-indexer/index1# index1 意思代表用戶

此時/opt/hbase-indexer/index1目錄下會有個conf文件夾，我們修改下面得schema.xml文件.
在最下面新加一個字段

<field name="HBase_Indexer_Test_cf1_name" type="string" indexed="true" stored="true"/> 屬性解析： name：這里的name是自定義，但是后面要使用到，要和后面的Morphline.conf文件中的outputField屬性對應。 type:字段類型 indexed：是否建立索引 stored：是否存儲

注意：這里name字段它對應了我們后續(xù)需要修改Morphline.conf文件中的outputField屬性。因此可以看成是hbase中需要創(chuàng)建索引的值。因此我們建議將其與表名和列族結合，格式建議如下：

HBase_Indexer_ZDTable_fn_name Hbase_indexer_表名_列簇_列名

再修改solrconfig.xml，找到下面的配置將false改為true，這個是硬提交，會影響性能

<autoCommit><maxTime>${solr.autoCommit.maxTime:60000}</maxTime><openSearcher>true</openSearcher></autoCommit>

3.創(chuàng)建 collection實例并將配置文件上傳到 zookeeper：

solrctl instancedir --create index1 /opt/hbase-indexer/index1

4.上傳到 zookeeper之后，其他節(jié)點就可以從zookeeper下載配置文件。接下來創(chuàng)建 collection:

solrctl collection --create index1如果希望將數(shù)據(jù)分散到各個節(jié)點進行存儲和檢索，則需要創(chuàng)建多個shard，需要使用如下命令solrctl collection --create bqjr -s 7-r 3-m 21其中-s表示設置Shard數(shù)為7，-r表示設置的replica數(shù)為3,-m表示最大shards數(shù)目(7*3)

5.創(chuàng)建 Lily HBase Indexer 配置

[root@test119 index1]# cat morphline-hbase-mapper.xml <?xml version="1.0"?>   <indexer table="zh_ams_ns:zhongda_custom_task_cp" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper" read-row="never" >  <param name="morphlineFile" value="morphlines.conf"/> <param name="morphlineId" value="ZDTableMap"/> </indexer>

6.配置Morphline
通過CM頁面進入到Key-Value Store Indexer的配置頁面，里面有一個Morphlines文件。修改它：

修改完后重啟Key-Value Store服務

SOLR_LOCATOR : {# Name of solr collectioncollection : hbaseindexer# ZooKeeper ensemblezkHost : "$ZK_HOST" }morphlines : [ { #與morphline-hbase-mapper中value相同 id : ZDTableMap importCommands : ["org.kitesdk.**", "com.ngdata.**"]commands : [ {extractHBaseCells {mappings : [{inputColumn : "fn:name"outputField : "HBase_Indexer_ZDTable_fn_name" type : string source : value}]}}{ logDebug { format : "output record: {}", args : ["@{}"] } } ] } ]

注：

id:表示當前morphlines的名稱，與上一步的value="test3Map"要一致 importCommands:需要引入的命令包地址 extractHBaseCells:該命令用來讀取HBase列數(shù)據(jù)并寫入到SolrInputDocument對象中，該命令必須包含零個或者多個mappings命令對象。 mappings:用來指定HBase列限定符的字段映射。 inputColumn:需要寫入到solr中的HBase列字段。值包含列族和列限定符，并用‘ : ’分開。其中列限定符也可以使用通配符*來表示，譬如可以使用c1:*表示讀取只要列族為data的所有hbase列數(shù)據(jù)，也可以通過c1:na*來表示讀取列族為c1列限定符已na開頭的字段值. outputField:用來表示morphline讀取的記錄需要輸出的數(shù)據(jù)字段名稱，該名稱必須和solr中的schema.xml文件的field節(jié)點自定義的name名稱保持一致，否則寫入不正確 type:用來定義讀取HBase數(shù)據(jù)的數(shù)據(jù)類型，HBase中的數(shù)據(jù)都是以byte[]的形式保存，但是所有的內容在Solr中索引為text形式，所以需要一個方法來把byte[]類型轉換為實際的數(shù)據(jù)類型。type參數(shù)的值就是用來做這件事情的。現(xiàn)在支持的數(shù)據(jù)類型有：byte,int,long,string,boolean,float,double,short和bigdecimal。當然你也可以指定自定的數(shù)據(jù)類型，只需要實現(xiàn)com.ngdata.hbaseindexer.parse.ByteArrayValueMapper接口即可 source:用來指定HBase的KeyValue那一部分作為索引輸入數(shù)據(jù)，可選的有‘value’和'qualifier',當為value的時候表示使用HBase的列值作為索引輸入，當為qualifier的時候表示使用HBase的列限定符作為索引輸入

7.注冊 Lily HBase Indexer Configuration 和 Lily HBase Indexer Service

hbase-indexer add-indexer \ --name ZDindexer \ --indexer-conf /opt/hbase-indexer/index1/morphline-hbase-mapper.xml --connection-param solr.zk=test110:2181,test115:2181,test119:2181/solr \ --connection-param solr.collection=index1 \ --zookeeper test110:2181,test115:2181,test119:2181

運行hbase-indexer list-indexers查看添加成功，
此時新增數(shù)據(jù)已經(jīng)可以通過solr查詢
8.批量同步索引
在運行命令的目錄下必須有morphlines.conf文件，執(zhí)行

find / |grep morphlines.conf$

一般我們選擇最新的那個process
cd 進入目錄

hadoop --config /etc/hadoop/conf \ jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.14.0-job.jar \ --conf /etc/hbase/conf/hbase-site.xml \ --hbase-indexer-file /opt/hbase-indexer/index1/morphline-hbase-mapper.xml \ --zk-host test110:2181,test115:2181,test119:2181/solr \ --collection index1 \ --reducers 0 \ --go-live

總結

以上是生活随笔為你收集整理的Hbase二级索引+CDH+Lily的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Hbase Compaction 源码分
下一篇： Dubbo调用时报错Invalid to

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Hbase二级索引+CDH+Lily

注意：這里name字段它對應了我們后續(xù)需要修改Morphline.conf文件中的outputField屬性。因此可以看成是hbase中需要創(chuàng)建索引的值。因此我們建議將其與表名和列族結合，格式建議如下：

修改完后重啟Key-Value Store服務

總結