Elasticsearch 2.0以上版本根据条件批量删除Java如何实现
本文轉(zhuǎn)載自:http://blog.csdn.net/u014039577/article/details/51802078,僅為了個人收藏,請支持原創(chuàng)作者。
Elasticsearch在2.0以前版本,刪除操作有兩種方式,一種是通過id來進行刪除,但是這種方式一般不常用,因為id不容易得到;另一種方式是通過先查詢操作,然后刪除,也就是通過client.prepareDeleteByQuery這種方式來根據(jù)條件批量刪除數(shù)據(jù):
[java] view plain copy
DeleteByQueryResponse response = client.prepareDeleteByQuery(“l(fā)ibrary”)
.setQuery(QueryBuilders.termQuery(“title”, “ElasticSearch”))
.execute().actionGet();
但是Delete by Query在2.0版本及其以上的版本已經(jīng)被移除了,因為這種方式會自動強制刷新,所以在大量索引并發(fā)的情況下,會很快造成內(nèi)存溢出。
詳情可查看:https://www.elastic.co/guide/en/elasticsearch/client/java-api/1.7/delete-by-query.html
那么在2.0以后的版本,我們?nèi)绾蝸磉M行批量的刪除呢?
我們可以先通過Search API查詢,然后得到需要刪除的批量數(shù)據(jù)的id,然后再通過id來刪除,但是這種方式在大批量數(shù)據(jù)的刪除的時候,依然是行不通的。
具體實現(xiàn)代碼:
[java] view plain copy
public void deleteByTerm(Client client){
BulkRequestBuilder bulkRequest = client.prepareBulk();
SearchResponse response = client.prepareSearch(“megacorp”).setTypes(“employee”)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery(“first_name”, “xiaoming”))
.setFrom(0).setSize(20).setExplain(true).execute().actionGet();
for(SearchHit hit : response.getHits()){
String id = hit.getId();
bulkRequest.add(client.prepareDelete(“megacorp”, “employee”, id).request());
}
BulkResponse bulkResponse = bulkRequest.get();
if (bulkResponse.hasFailures()) {
for(BulkItemResponse item : bulkResponse.getItems()){
System.out.println(item.getFailureMessage());
}
}else {
System.out.println(“delete ok”);
}
}
同樣通過delete-by-query插件,我們還可以根據(jù)type來批量刪除數(shù)據(jù),這種方式能夠刪除大批量的數(shù)據(jù),他是現(xiàn)將要刪除的數(shù)據(jù)一個一個做標(biāo)記,然后再刪除,于是效率會比較低。下面是官網(wǎng)的說明:https://www.elastic.co/guide/en/elasticsearch/plugins/2.3/plugins-delete-by-query.html
Queries which match large numbers of documents may run for a long time, as every document has to be deleted individually. Don’t use delete-by-query to clean out all or most documents in an index. Rather create a new index and perhaps reindex the documents you want to keep.
可見這種刪除方式并不適合大批量數(shù)據(jù)的刪除,因為效率真的是很低,我是親身體驗過了。
這種方式需要先引入delete-by-query插件包,然后使用插件的api來刪除:
[java] view plain copy
org.elasticsearch.plugin
delete-by-query
2.3.2
具體實現(xiàn)代碼:
[java] view plain copy
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.ResourceBundle;
import java.util.Stack;
import org.elasticsearch.action.deletebyquery.DeleteByQueryAction;
import org.elasticsearch.action.deletebyquery.DeleteByQueryRequestBuilder;
import org.elasticsearch.action.deletebyquery.DeleteByQueryResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.plugin.deletebyquery.DeleteByQueryPlugin;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.xgd.log.common.ExceptionUtil;
public class EsDeleteByType {
private static final Logger logger = LoggerFactory.getLogger(EsDeleteByType.class); private Client client; private static ResourceBundle getEsConfig(){ return ResourceBundle.getBundle("elasticsearch"); } private void getClient(){ String clusterName = getEsConfig().getString("clusterName"); String hosts = getEsConfig().getString("hosts"); if (hosts == null || clusterName == null) { throw new IllegalArgumentException("hosts or clusterName was null."); } Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build(); client = TransportClient.builder() .addPlugin(DeleteByQueryPlugin.class) .settings(settings).build(); String[] hostsArray = hosts.split(","); for(String hostAndPort : hostsArray){ String[] tmpArray = hostAndPort.split(":"); try { client = ((TransportClient)client).addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(tmpArray[0]), Integer.valueOf(tmpArray[1]))); } catch (NumberFormatException e) { logger.error(ExceptionUtil.getTrace(e)); } catch (UnknownHostException e) { logger.error(ExceptionUtil.getTrace(e)); } } } /** * 判斷一個index中的type是否有數(shù)據(jù) * @param index * @param type * @return * @throws Exception */ public Boolean existDocOfType(String index, String type) throws Exception { SearchRequestBuilder builder = client.prepareSearch(index).setTypes(type) .setSearchType(SearchType.QUERY_THEN_FETCH) .setSize(1); SearchResponse response = builder.execute().actionGet(); long docNum = response.getHits().getTotalHits(); if (docNum == 0) { return false; } return true; } /** * 根據(jù)type來刪除數(shù)據(jù) * @param index * @param types * @return */ public long deleteDocByType(String index, String[] types) { getClient(); long oldTime = System.currentTimeMillis(); StringBuilder b = new StringBuilder(); b.append("{\"query\":{\"match_all\":{}}}"); DeleteByQueryResponse response = new DeleteByQueryRequestBuilder(client, DeleteByQueryAction.INSTANCE) .setIndices(index).setTypes(types) .setSource(b.toString()) .execute().actionGet(); Stack<String> allTypes = new Stack<String>(); for(String type : types){ allTypes.add(type); } while(!allTypes.isEmpty()){ String type = allTypes.pop(); while(true){ try { if (existDocOfType(index, type) == false) { break; } } catch (Exception e) { logger.error("queryError: " + e.getMessage()); } } } System.out.println(System.currentTimeMillis() - oldTime); return response.getTotalDeleted(); }}
那么當(dāng)我們在開發(fā)中,使用到elasticsearch的時候,總會涉及到大批量數(shù)據(jù)的刪除,我們要怎么辦呢?
經(jīng)過很長時間的糾結(jié),我發(fā)現(xiàn)使用elasticsearch存儲數(shù)據(jù)的時候,千萬不要把所有數(shù)據(jù)都存儲于一個index,這樣一個是不利于查詢的效率,一個是不利于后面的刪除,既然我們不能index中去刪除部分的大批量數(shù)據(jù),那么我們?yōu)樯恫桓淖円环N思路呢,就是分索引,然后通過索引來刪除數(shù)據(jù),例如:我在生產(chǎn)上面,每天有5億的數(shù)據(jù),那么我每天在集群中生成一個index用于存儲這5億的數(shù)據(jù),如果我們的elasticsearch集群對數(shù)據(jù)只要求保存7天的數(shù)據(jù),超過7天的數(shù)據(jù)就可以刪除了,這樣我們可以通過index直接刪除7天以前的數(shù)據(jù),這種方式,我們在查詢的時候不會在所有數(shù)據(jù)中查詢,只需要在所要查詢的時間段內(nèi)查詢,便提高了查詢的效率,同時刪除效率的問題也解決了,能夠很快刪除不需要的數(shù)據(jù),釋放掉磁盤空間。
針對于elasticsearch大批量數(shù)據(jù)刪除效率的問題,目前官網(wǎng)上面也沒有一個特別好的解決辦法,這種方式算是目前還算能行得通的方式了。
總結(jié)
以上是生活随笔為你收集整理的Elasticsearch 2.0以上版本根据条件批量删除Java如何实现的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ajax请求成功后location.hr
- 下一篇: Spring Boot 集成 Activ