TMDB数据导入elasticsearch7
                                                            生活随笔
收集整理的這篇文章主要介紹了
                                TMDB数据导入elasticsearch7
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.                        
                                目錄
1、下載tmdb數(shù)據(jù)下載地址:TMDB 5000 Movie Dataset | Kaggle
2、下載兩個(gè)文檔(tmdb_5000_credits.csv和tmdb_5000_movies.csv)
3、項(xiàng)目POM文件
4、導(dǎo)入ES代碼
5、新建ES索引
1、下載tmdb數(shù)據(jù)
 下載地址:TMDB 5000 Movie Dataset | Kaggle
 
注意:注冊(cè)賬號(hào)需要翻墻才可以
2、下載兩個(gè)文檔(tmdb_5000_credits.csv和tmdb_5000_movies.csv)
? ? 整合文檔將Excel進(jìn)行整合,將需要的字段整合到move.csv
3、項(xiàng)目POM文件
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.1.6.RELEASE</version><relativePath/> <!-- lookup parent from repository --></parent><groupId>com.imooc</groupId><artifactId>dianping</artifactId><version>0.0.1-SNAPSHOT</version><name>dianping</name><description>dianping spring boot java project</description><properties><java.version>1.8</java.version><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.8</maven.compiler.source><maven.compile.target>1.8</maven.compile.target></properties><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-aop</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>7.3.0</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>transport</artifactId><version>7.3.0</version></dependency><dependency><groupId>org.elasticsearch.plugin</groupId><artifactId>transport-netty4-client</artifactId><version>7.3.0</version></dependency><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.58</version></dependency><dependency><groupId>com.opencsv</groupId><artifactId>opencsv</artifactId><version>4.2</version></dependency></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId></plugin></plugins></build></project>4、導(dǎo)入ES代碼
package com.imooc.dianping.service.impl;import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import com.imooc.dianping.service.SellerService; import com.opencsv.CSVReader; import org.elasticsearch.action.ActionListener; import org.elasticsearch.action.bulk.BulkRequest; import org.elasticsearch.action.bulk.BulkResponse; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.xcontent.XContentType; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import org.springframework.util.StringUtils;import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStreamReader; import java.nio.charset.Charset; import java.util.List;@Service public class SellerServiceImpl implements SellerService {@AutowiredTransportClient transportClient;@Overridepublic void importData() {BulkRequest bulkRequest = new BulkRequest();int lineIndex = 0;try {InputStreamReader reader = new InputStreamReader(new FileInputStream("./tmdb_5000_movies.csv"), Charset.forName("UTF-8"));CSVReader csvReader = new CSVReader(reader, ',');//讀取CSV文件List<String[]> allReader = csvReader.readAll();for (String[] records :allReader) {lineIndex++;if (lineIndex == 1) {continue;}System.out.println("第"+lineIndex+"行");if(StringUtils.isEmpty(records[20])){continue;}if(records[20].contains("[]")){continue;}JSONArray jsonArray = JSONArray.parseArray(records[20]);//獲取文檔字段String character = jsonArray.getJSONObject(0).getString("character");String name = jsonArray.getJSONObject(0).getString("name");JSONObject cast = new JSONObject();cast.put("character", character);cast.put("name", name);String date = records[11];if (StringUtils.isEmpty(date)) {date = "1970/01/01";}bulkRequest.add(new IndexRequest("movie", "_doc", String.valueOf(lineIndex)).source(XContentType.JSON,"title", records[17], "tagline", records[16], "release_date", date, "popularity", records[8], "cast", cast, "overview", records[7]));}reader.close();//將數(shù)據(jù)導(dǎo)入EStransportClient.bulk(bulkRequest, new ActionListener<BulkResponse>() {@Overridepublic void onResponse(BulkResponse bulkItemResponses) {System.out.println(bulkItemResponses);}@Overridepublic void onFailure(Exception e) {}});} catch (FileNotFoundException e) {System.out.println("第---->"+lineIndex+"行");e.printStackTrace();} catch (IOException e) {System.out.println("第---->"+lineIndex+"行");e.printStackTrace();}} }5、新建ES索引
PUT /movie {"settings": {"number_of_shards": 1,"number_of_replicas": 1},"mappings": {"properties": {"title": {"type": "text","analyzer": "english"},"tagline": {"type": "text","analyzer": "english"},"release_date": {"type": "date","format": "8yyyy-MM-dd||yyyy-M-dd||yyyy-MM-d||yyyy-M-d"},"popularity": {"type": "double"},"cast": {"type": "object","properties": {"character": {"type": "text","analyzer": "standard"},"name": {"type": "text","analyzer": "standard"}}},"overview": {"type": "text","analyzer": "english"}}} }為了方便導(dǎo)入,源碼和文檔進(jìn)行了整合,代碼如下
abel/importData
總結(jié)
以上是生活随笔為你收集整理的TMDB数据导入elasticsearch7的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
 
                            
                        - 上一篇: 机器学习系列(2)_数据分析之Kaggl
- 下一篇: UWB定位记录二(DWM1000模组介绍
