爬虫 spider10——搭建elk平台,开发服务提供者
第一步:爬取指定數(shù)據(jù),去重復,并存儲到mysql。
Springboot+ssm+定時(定時器)爬取+redis去重+mybatis保存。
詳情請看爬蟲 spider09——爬取指定數(shù)據(jù),去重復,并存儲到mysql
https://blog.csdn.net/qq_41946557/article/details/102573282
?
第二步:搭建elk平臺,把mysql中數(shù)據(jù)導入es中
第三步:開發(fā)服務提供者(8001),讀取es中數(shù)據(jù),提供關鍵字查詢功能。
?
先說第二步:搭建elk平臺,把mysql中數(shù)據(jù)導入es中
通過第一步我們已經(jīng)把數(shù)據(jù)爬取到數(shù)據(jù)庫mysql中
此時我們要做的是去整合elasticsearch(把mysql中數(shù)據(jù)導入es中)
? ? ?0.下載logstash
https://artifacts.elastic.co/downloads/logstash/logstash-7.3.2.zip
| input { ??# 多張表的同步只需要設置多個jdbc的模塊就行了 ??jdbc { ??????# mysql 數(shù)據(jù)庫鏈接,shop為數(shù)據(jù)庫名 ??????jdbc_connection_string => "jdbc:mysql://localhost:3306/spider?useUnicode=true&characterEncoding=utf8&serverTimezone=UTC" ??????# 用戶名和密碼 ??????jdbc_user => "root" ??????jdbc_password => "123456" ? ??????# 驅動 ??????jdbc_driver_library => "D:/es/logstash-7.3.2/mysql/mysql-connector-java-5.1.6-bin.jar" ? ??????# 驅動類名 ??????jdbc_driver_class => "com.mysql.jdbc.Driver" ??????jdbc_validate_connection => "true" ? ??????#是否分頁 ??????jdbc_paging_enabled => "true" ??????jdbc_page_size => "1000" ??????#時區(qū) ??????jdbc_default_timezone => "Asia/Shanghai" ? ??????#直接執(zhí)行sql語句 ??????statement => "select * from news where id >=:sql_last_value order by id asc" ??????# 執(zhí)行的sql 文件路徑+名稱 ??????# statement_filepath => "/hw/elasticsearch/logstash-6.2.4/bin/test.sql" ? ??????#設置監(jiān)聽間隔 ?各字段含義(由左至右)分、時、天、月、年,全部為*默認含義為每分鐘都更新 ??????schedule => "* * * * *" ??????#每隔10分鐘執(zhí)行一次 ??????#schedule => "*/10 * * * *" ??????#是否記錄上次執(zhí)行結果, 如果為真,將會把上次執(zhí)行到的 tracking_column 字段的值記錄下來,保存到last_run_metadata_path ??????record_last_run => true ??????#記錄最新的同步的offset信息 ??????last_run_metadata_path => "D:/es/logstash-7.3.2/logs/last_id.txt" ? ??????use_column_value => true ??????#遞增字段的類型,numeric 表示數(shù)值類型, timestamp 表示時間戳類型 ??????tracking_column_type => "numeric" ??????tracking_column => "id" ?????? ??????clean_run => false ? ??????# 索引類型 ??????#type => "jdbc" ????} ? } ? ? output { ??elasticsearch { ????????#es的ip和端口 ????????hosts => ["http://localhost:9200"] ????????#ES索引名稱(自己定義的) ????????index => "spider" ????????#文檔類型 ????????document_type => "_doc" ????????#設置數(shù)據(jù)的id為數(shù)據(jù)庫中的字段 ????????document_id => "%{id}" ????} ????stdout { ????????codec => json_lines ????} } | 
?
? ? ?6.啟動logstash,進入bin文件夾下,執(zhí)行:logstash -f logstash.conf
?
【注】爬取前需啟動es
?
啟動kibana檢驗:
?
?
?
第三步:開發(fā)服務提供者(8001),讀取es中數(shù)據(jù),提供關鍵字查詢功能。?
1、修改pom.xml
<dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>7.3.2</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-client</artifactId><version>7.3.2</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.3.2</version><exclusions><exclusion><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId></exclusion><exclusion><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-client</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency>?
2、修改yml
#elasticSearch配置 elasticSearch:hostlist: 127.0.0.1:9200client:connectNum: 10connectPerRoute: 503、導入es訪問相關工具類
從上到下代碼演示:?
?EsEntity
package com.henu.es.bean;public final class EsEntity<T> {//文檔idprivate String id;//一條文檔private T data;public EsEntity() {}public EsEntity(String id, T data) {this.data = data;this.id = id;}public String getId() {return id;}public void setId(String id) {this.id = id;}public T getData() {return data;}public void setData(T data) {this.data = data;} }?EsPage
package com.henu.es.bean;import lombok.Getter; import lombok.NoArgsConstructor; import lombok.Setter; import lombok.ToString; import java.util.List; import java.util.Map;@Getter @Setter @NoArgsConstructor @ToString public class EsPage {/*** 當前頁*/private int currentPage;/*** 每頁顯示多少條*/private int pageSize;/*** 總記錄數(shù)*/private int recordCount;/*** 本頁的數(shù)據(jù)列表*/private List<Map<String, Object>> recordList;/*** 總頁數(shù)*/private int pageCount;/*** 頁碼列表的開始索引(包含)*/private int beginPageIndex;/*** 頁碼列表的結束索引(包含)*/private int endPageIndex;/*** 只接受前4個必要的屬性,會自動的計算出其他3個屬性的值* @param currentPage* @param pageSize* @param recordCount* @param recordList*/public EsPage(int currentPage, int pageSize, int recordCount, List<Map<String, Object>> recordList) {this.currentPage = currentPage;this.pageSize = pageSize;this.recordCount = recordCount;this.recordList = recordList;// 計算總頁碼pageCount = (recordCount + pageSize - 1) / pageSize;// 計算 beginPageIndex 和 endPageIndex// 總頁數(shù)不多于10頁,則全部顯示if (pageCount <= 10) {beginPageIndex = 1;endPageIndex = pageCount;}// 總頁數(shù)多于10頁,則顯示當前頁附近的共10個頁碼else {// 當前頁附近的共10個頁碼(前4個 + 當前頁 + 后5個)beginPageIndex = currentPage - 4;endPageIndex = currentPage + 5;// 當前面的頁碼不足4個時,則顯示前10個頁碼if (beginPageIndex < 1) {beginPageIndex = 1;endPageIndex = 10;}// 當后面的頁碼不足5個時,則顯示后10個頁碼if (endPageIndex > pageCount) {endPageIndex = pageCount;beginPageIndex = pageCount - 10 + 1;}}} }?User
package com.henu.es.bean;import lombok.Data; /*** userRepository操作的bean*/ @Data public class User {private Integer id;private String name;private String address;private Integer sex; }ElasticsearchRestClient
package com.henu.es.client;import com.henu.es.factory.ESClientSpringFactory; import lombok.Getter; import lombok.Setter; import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.Scope;@Configuration @Getter @Setter @ComponentScan(basePackageClasses= ESClientSpringFactory.class) public class ElasticsearchRestClient {private final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchRestClient.class);@Value("${elasticSearch.client.connectNum}")private Integer connectNum;@Value("${elasticSearch.client.connectPerRoute}")private Integer connectPerRoute;@Value("${elasticSearch.hostlist}")private String hostlist;@Beanpublic HttpHost[] httpHost(){//解析hostlist配置信息String[] split = hostlist.split(",");//創(chuàng)建HttpHost數(shù)組,其中存放es主機和端口的配置信息HttpHost[] httpHostArray = new HttpHost[split.length];for(int i=0;i<split.length;i++){String item = split[i];httpHostArray[i] = new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http");}LOGGER.info("init HttpHost");return httpHostArray;}@Bean(initMethod="init",destroyMethod="close")public ESClientSpringFactory getFactory(){LOGGER.info("ESClientSpringFactory 初始化");return ESClientSpringFactory.build(httpHost(), connectNum, connectPerRoute);}@Bean@Scope("singleton")public RestClient getRestClient(){LOGGER.info("RestClient 初始化");return getFactory().getClient();}@Bean(name = "restHighLevelClient")@Scope("singleton")public RestHighLevelClient getRHLClient(){LOGGER.info("RestHighLevelClient 初始化");return getFactory().getRhlClient();} }SpiderController
package com.henu.es.controller;import com.henu.es.bean.EsPage; import com.henu.es.util.ElasticsearchUtil; import org.apache.ibatis.annotations.Param; import org.elasticsearch.index.query.QueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.ResponseBody;/*** @author George* @description**/ @Controller public class SpiderController {@RequestMapping("/search")@ResponseBodypublic String search(@RequestParam(value = "keyword")String keyword, @RequestParam(value="currentPage",defaultValue = "1") int currentPage, @RequestParam(value="pageSize",defaultValue = "10") int pageSize){System.out.println("好好學習,天天向上:"+keyword);QueryBuilder queryBuilder = QueryBuilders.matchQuery("intro", keyword);EsPage esPage = ElasticsearchUtil.searchDataPage("spider", currentPage, pageSize, queryBuilder, "id,appid,title,intro,url,source,updatetime", "id", "intro");return esPage.toString();} }ESClientSpringFactory
package com.henu.es.factory;import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestClientBuilder; import org.elasticsearch.client.RestHighLevelClient; import org.slf4j.Logger; import org.slf4j.LoggerFactory;import java.io.IOException; import java.util.Arrays;public class ESClientSpringFactory {private final Logger LOGGER = LoggerFactory.getLogger(ESClientSpringFactory.class);public static int CONNECT_TIMEOUT_MILLIS = 1000;public static int SOCKET_TIMEOUT_MILLIS = 30000;public static int CONNECTION_REQUEST_TIMEOUT_MILLIS = 500;public static int MAX_CONN_PER_ROUTE = 10;public static int MAX_CONN_TOTAL = 30;private static HttpHost[] HTTP_HOST;private RestClientBuilder builder;private RestClient restClient;private RestHighLevelClient restHighLevelClient;private static ESClientSpringFactory esClientSpringFactory = new ESClientSpringFactory();private ESClientSpringFactory(){}public static ESClientSpringFactory build(HttpHost[] httpHostArray,Integer maxConnectNum, Integer maxConnectPerRoute){HTTP_HOST = httpHostArray;MAX_CONN_TOTAL = maxConnectNum;MAX_CONN_PER_ROUTE = maxConnectPerRoute;return esClientSpringFactory;}public static ESClientSpringFactory build(HttpHost[] httpHostArray,Integer connectTimeOut, Integer socketTimeOut,Integer connectionRequestTime,Integer maxConnectNum, Integer maxConnectPerRoute){HTTP_HOST = httpHostArray;CONNECT_TIMEOUT_MILLIS = connectTimeOut;SOCKET_TIMEOUT_MILLIS = socketTimeOut;CONNECTION_REQUEST_TIMEOUT_MILLIS = connectionRequestTime;MAX_CONN_TOTAL = maxConnectNum;MAX_CONN_PER_ROUTE = maxConnectPerRoute;return esClientSpringFactory;}public void init(){builder = RestClient.builder(HTTP_HOST);setConnectTimeOutConfig();setMutiConnectConfig();restClient = builder.build();restHighLevelClient = new RestHighLevelClient(builder);LOGGER.info("init factory" + Arrays.toString(HTTP_HOST));}/*** 配置連接時間延時* */public void setConnectTimeOutConfig(){builder.setRequestConfigCallback(requestConfigBuilder -> {requestConfigBuilder.setConnectTimeout(CONNECT_TIMEOUT_MILLIS);requestConfigBuilder.setSocketTimeout(SOCKET_TIMEOUT_MILLIS);requestConfigBuilder.setConnectionRequestTimeout(CONNECTION_REQUEST_TIMEOUT_MILLIS);return requestConfigBuilder;});}/*** 使用異步httpclient時設置并發(fā)連接數(shù)* */public void setMutiConnectConfig(){builder.setHttpClientConfigCallback(httpClientBuilder -> {httpClientBuilder.setMaxConnTotal(MAX_CONN_TOTAL);httpClientBuilder.setMaxConnPerRoute(MAX_CONN_PER_ROUTE);return httpClientBuilder;});}public RestClient getClient(){return restClient;}public RestHighLevelClient getRhlClient(){return restHighLevelClient;}public void close() {if (restClient != null) {try {restClient.close();} catch (IOException e) {e.printStackTrace();}}LOGGER.info("close client");} }ElasticsearchUtil
package com.henu.es.util;import com.alibaba.fastjson.JSON; import com.henu.es.bean.EsEntity; import com.henu.es.bean.EsPage; import org.elasticsearch.action.bulk.BulkRequest; import org.elasticsearch.action.delete.DeleteRequest; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.index.IndexResponse; import org.elasticsearch.action.search.ClearScrollRequest; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.action.search.SearchScrollRequest; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.client.indices.CreateIndexRequest; import org.elasticsearch.client.indices.CreateIndexResponse; import org.elasticsearch.client.indices.GetIndexRequest; import org.elasticsearch.common.text.Text; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.common.xcontent.XContentBuilder; import org.elasticsearch.common.xcontent.XContentFactory; import org.elasticsearch.common.xcontent.XContentType; import org.elasticsearch.index.query.QueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.reindex.DeleteByQueryRequest; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.SearchHits; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder; import org.elasticsearch.search.sort.FieldSortBuilder; import org.elasticsearch.search.sort.SortOrder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import org.springframework.util.StringUtils;import javax.annotation.PostConstruct; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors;/*** /**** @author Administrator* @date 2019/10/13 0013 23:32* @description*/ @Component public class ElasticsearchUtil<T> {private static final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchUtil.class);@Autowiredprivate RestHighLevelClient rhlClient;private static RestHighLevelClient client;/*** spring容器初始化的時候執(zhí)行該方法*/@PostConstructpublic void init() {client = this.rhlClient;}/*** 判斷索引是否存在 *** @param index 索引,類似數(shù)據(jù)庫* @return boolean* @auther: LHL*/public static boolean isIndexExist(String index) {boolean exists = false;try {exists = client.indices().exists(new GetIndexRequest(index), RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}if (exists) {LOGGER.info("Index [" + index + "] is exist!");} else {LOGGER.info("Index [" + index + "] is not exist!");}return exists;}/*** 創(chuàng)建索引以及映射mapping,并給索引某些字段指定iK分詞,以后向該索引中查詢時,就會用ik分詞。** @param: indexName 索引,類似數(shù)據(jù)庫* @return: boolean* @auther: LHL*/public static boolean createIndex(String indexName) {if (!isIndexExist(indexName)) {LOGGER.info("Index is not exits!");}CreateIndexResponse createIndexResponse = null;try {//創(chuàng)建映射XContentBuilder mapping = null;try {mapping = XContentFactory.jsonBuilder().startObject().startObject("properties")//.startObject("m_id").field("type","keyword").endObject() //m_id:字段名,type:文本類型,analyzer 分詞器類型//該字段添加的內(nèi)容,查詢時將會使用ik_max_word 分詞 //ik_smart ik_max_word standard.startObject("id").field("type", "text").endObject().startObject("title").field("type", "text").field("analyzer", "ik_smart").endObject().startObject("content").field("type", "text").field("analyzer", "ik_smart").endObject().startObject("state").field("type", "text").endObject().endObject().startObject("settings")//分片數(shù).field("number_of_shards", 3)//副本數(shù).field("number_of_replicas", 1).endObject().endObject();} catch (IOException e) {e.printStackTrace();}CreateIndexRequest request = new CreateIndexRequest(indexName).source(mapping);//設置創(chuàng)建索引超時2分鐘request.setTimeout(TimeValue.timeValueMinutes(2));createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}return createIndexResponse.isAcknowledged();}/*** 數(shù)據(jù)添加,一條文檔** @param content 要增加的數(shù)據(jù)* @param indexName 索引,類似數(shù)據(jù)庫* @param id id* @return String* @auther: LHL*/public static String addData(XContentBuilder content, String indexName, String id) {IndexResponse response = null;try {IndexRequest request = new IndexRequest(indexName).id(id).source(content);response = client.index(request, RequestOptions.DEFAULT);LOGGER.info("addData response status:{},id:{}", response.status().getStatus(), response.getId());} catch (IOException e) {e.printStackTrace();}return response.getId();}/*** 批量添加數(shù)據(jù)** @param list 要批量增加的數(shù)據(jù)* @param index 索引,類似數(shù)據(jù)庫* @return* @auther: LHL*/public void insertBatch(String index, List<EsEntity> list) {BulkRequest request = new BulkRequest();list.forEach(item -> request.add(new IndexRequest(index).id(item.getId()).source(JSON.toJSONString(item.getData()), XContentType.JSON)));try {client.bulk(request, RequestOptions.DEFAULT);} catch (Exception e) {throw new RuntimeException(e);}}/*** 根據(jù)條件刪除** @param builder 要刪除的數(shù)據(jù) new TermQueryBuilder("userId", userId)* @param indexName 索引,類似數(shù)據(jù)庫* @return* @auther: LHL*/public void deleteByQuery(String indexName, QueryBuilder builder) {DeleteByQueryRequest request = new DeleteByQueryRequest(indexName);request.setQuery(builder);//設置批量操作數(shù)量,最大為10000request.setBatchSize(10000);request.setConflicts("proceed");try {client.deleteByQuery(request, RequestOptions.DEFAULT);} catch (Exception e) {throw new RuntimeException(e);}}/*** 批量刪除** @param idList 要刪除的數(shù)據(jù)id* @param index 索引,類似數(shù)據(jù)庫* @return* @auther: LHL*/public static <T> void deleteBatch(String index, Collection<T> idList) {BulkRequest request = new BulkRequest();idList.forEach(item -> request.add(new DeleteRequest(index, item.toString())));try {client.bulk(request, RequestOptions.DEFAULT);} catch (Exception e) {throw new RuntimeException(e);}}/*** 使用分詞查詢 高亮 排序 ,并分頁** @param index 索引名稱* @param startPage 當前頁* @param pageSize 每頁顯示條數(shù)* @param query 查詢條件* @param fields 需要顯示的字段,逗號分隔(缺省為全部字段)"id,appid,title,intro,source,updatetime"* @param highlightField 高亮字段* @return 結果*/public static EsPage searchDataPage(String index, int startPage, int pageSize, QueryBuilder query, String fields, String sortField, String highlightField) {SearchRequest searchRequest = new SearchRequest(index);SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();//設置一個可選的超時,控制允許搜索的時間searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 需要顯示的字段,逗號分隔(缺省為全部字段)if (!StringUtils.isEmpty(fields)) {System.out.println("顯示的字段:"+fields);searchSourceBuilder.fetchSource(fields.split(","), null);}//排序字段if (!StringUtils.isEmpty(sortField)) {searchSourceBuilder.sort(new FieldSortBuilder(sortField).order(SortOrder.ASC));}// 高亮(xxx=111,aaa=222)if (!StringUtils.isEmpty(highlightField)) {HighlightBuilder highlightBuilder = new HighlightBuilder();//設置前綴highlightBuilder.preTags("<span style='color:red' >");//設置后綴highlightBuilder.postTags("</span>");HighlightBuilder.Field highlightTitle = new HighlightBuilder.Field(highlightField);//熒光筆類型highlightTitle.highlighterType("unified");// 設置高亮字段highlightBuilder.field(highlightTitle);searchSourceBuilder.highlighter(highlightBuilder);}// 設置是否按查詢匹配度排序searchSourceBuilder.explain(true);if (startPage <= 0) {startPage = 0;}//如果 pageSize是10 那么startPage>9990 (10000-pagesize) 如果 20 那么 >9980 如果 50 那么>9950//深度分頁 TODOif (startPage > (10000 - pageSize)) {searchSourceBuilder.query(query);searchSourceBuilder// .setScroll(TimeValue.timeValueMinutes(1)).size(10000);//打印的內(nèi)容 可以在 Elasticsearch head 和 Kibana 上執(zhí)行查詢LOGGER.info("\n{}", searchSourceBuilder);// 執(zhí)行搜索,返回搜索響應信息searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = null;try {searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}long totalHits = searchResponse.getHits().getTotalHits().value;if (searchResponse.status().getStatus() == 200) {//使用scrollId迭代查詢List<Map<String, Object>> result = disposeScrollResult(searchResponse, highlightField);List<Map<String, Object>> sourceList = result.stream().parallel().skip((startPage - 1 - (10000 / pageSize)) * pageSize).limit(pageSize).collect(Collectors.toList());return new EsPage(startPage, pageSize, (int) totalHits, sourceList);}} else {//淺度分頁searchSourceBuilder.query(QueryBuilders.matchAllQuery());searchSourceBuilder.query(query);/*MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("username", "pretty");matchQueryBuilder.fuzziness(Fuzziness.AUTO);//在匹配查詢上啟用模糊匹配matchQueryBuilder.prefixLength(3);//在匹配查詢上設置前綴長度選項matchQueryBuilder.maxExpansions(10);//設置最大擴展選項以控制查詢的模糊過程searchSourceBuilder.query(matchQueryBuilder);*/// 分頁應用searchSourceBuilder//設置from確定結果索引的選項以開始搜索。默認為0// .from(startPage).from((startPage - 1) * pageSize)//設置size確定要返回的搜索匹配數(shù)的選項。默認為10.size(pageSize);//打印的內(nèi)容 可以在 Elasticsearch head 和 Kibana 上執(zhí)行查詢LOGGER.info("\n{}", searchSourceBuilder);// 執(zhí)行搜索,返回搜索響應信息searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = null;try {searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}long totalHits = searchResponse.getHits().getTotalHits().value;long length = searchResponse.getHits().getHits().length;LOGGER.debug("共查詢到[{}]條數(shù)據(jù),處理數(shù)據(jù)條數(shù)[{}]", totalHits, length);if (searchResponse.status().getStatus() == 200) {// 解析對象List<Map<String, Object>> sourceList = setSearchResponse(searchResponse, highlightField);return new EsPage(startPage, pageSize, (int) totalHits, sourceList);}}return null;}/*** 高亮結果集 特殊處理* @param searchResponse 搜索的結果集* @param highlightField 高亮字段*/private static List<Map<String, Object>> setSearchResponse(SearchResponse searchResponse, String highlightField) {List<Map<String, Object>> sourceList = new ArrayList<>();for (SearchHit searchHit : searchResponse.getHits().getHits()) {Map<String, Object> resultMap = getResultMap(searchHit, highlightField);sourceList.add(resultMap);}return sourceList;}/*** 獲取高亮結果集** @param: [hit, highlightField]* @return: java.util.Map<java.lang.String, java.lang.Object>* @auther: LHL*/private static Map<String, Object> getResultMap(SearchHit hit, String highlightField) {hit.getSourceAsMap().put("id", hit.getId());if (!StringUtils.isEmpty(highlightField)) {Text[] text = hit.getHighlightFields().get(highlightField).getFragments();String hightStr = null;if (text != null) {for (Text str : text) {hightStr = str.string();}//遍歷 高亮結果集,覆蓋 正常結果集hit.getSourceAsMap().put(highlightField, hightStr);}}return hit.getSourceAsMap();}public static <T> List<T> search(String index, SearchSourceBuilder builder, Class<T> c) {SearchRequest request = new SearchRequest(index);request.source(builder);try {SearchResponse response = client.search(request, RequestOptions.DEFAULT);SearchHit[] hits = response.getHits().getHits();List<T> res = new ArrayList<>(hits.length);for (SearchHit hit : hits) {res.add(JSON.parseObject(hit.getSourceAsString(), c));}return res;} catch (Exception e) {throw new RuntimeException(e);}}/*** 處理scroll結果** @param: [response, highlightField]* @return: java.util.List<java.util.Map < java.lang.String, java.lang.Object>>* @auther: LHL*/private static List<Map<String, Object>> disposeScrollResult(SearchResponse response, String highlightField) {List<Map<String, Object>> sourceList = new ArrayList<>();//使用scrollId迭代查詢while (response.getHits().getHits().length > 0) {String scrollId = response.getScrollId();try {response = client.scroll(new SearchScrollRequest(scrollId), RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}SearchHits hits = response.getHits();for (SearchHit hit : hits.getHits()) {Map<String, Object> resultMap = getResultMap(hit, highlightField);sourceList.add(resultMap);}}ClearScrollRequest request = new ClearScrollRequest();request.addScrollId(response.getScrollId());try {client.clearScroll(request, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}return sourceList;} }SpringbootElasticsearchApplication
package com.henu.es;import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication;/**/*** 操作elasticsearch有兩種方式:* (1)jest:默認不生效,需要導入包io.searchbox.jest* 配置application.properties,測試添加文檔和查詢文檔* (2)spring-data-es:導入spring-data-elasticsearch包* 配置application.properties:cluster-name cluster-nodes* 啟動要是報錯:可能是版本不匹配* 兩種用法:* (1)編寫接口繼承elasticsearchRepository* (2) elasticsearchTemplate* (3)spring-data-es CRUD + 分頁 + 高亮的練習**/ @SpringBootApplication public class SpringbootElasticsearchApplication {public static void main(String[] args) {// 避免netty沖突System.setProperty("es.set.netty.runtime.available.processors", "false");SpringApplication.run(SpringbootElasticsearchApplication.class, args);} }搜索結果:
?
?
總結
以上是生活随笔為你收集整理的爬虫 spider10——搭建elk平台,开发服务提供者的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: springMVC——SSM整合(IDE
- 下一篇: 爬虫 spider11——搭建分布式架构
