生活随笔
收集整理的這篇文章主要介紹了
实践篇:利用函数计算轻松构建全文检索系统
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
前言
隨著云存儲的廣泛使用,文檔數(shù)量與日俱增,越來越多的同學提出了這樣的疑問:如何在眾多文檔中,快速定位到自己想找的文檔呢?如何能快速搭建起基于存儲服務的全文搜索系統(tǒng)呢?如何讓搜索服務及時反映文檔的增刪改呢?
這一切,函數(shù)計算都可以輕松幫你實現(xiàn)。
本文以OSS作為云存儲服務的例子,OpenSearch作為搜索服務的例子,通過阿里云函數(shù)計算,實現(xiàn)一個簡單高效的針對文本文檔的全文檢索系統(tǒng)。
技術方案
具體實現(xiàn)
1.開通阿里云對象存儲(Object Storage Service,簡稱OSS)
阿里云對象存儲服務(OSS)為用戶提供基于網絡的數(shù)據(jù)存取服務,用戶可以通過網絡隨時存儲和調用包括文本,圖片,音頻和視頻等在內的各種非結構化數(shù)據(jù)文件。具體開通方式請參考阿里云OSS快速入門。
本示例中,開通OSS之后在“華北2”區(qū)域新建名為“fc-search-demo”的bucket,類型為標準存儲,如下圖所示。更多配置選項,請參考創(chuàng)建存儲空間以及具體需求選擇。
2.開通阿里云開放搜索(OpenSearch)
阿里云開放搜索(OpenSearch)是一款結構化數(shù)據(jù)搜索托管服務,為用戶提供簡單,高效,穩(wěn)定,低成本和可擴展的搜索解決方案。具體開通方式請參考開放搜索快速入門。
本示例中,開通OpenSearch之后在“華北2”區(qū)域新建了名為“oss_fc_search”的應用,類型為高級版,如下圖所示。更多配置選項,請參考應用類型以及具體需求選擇。
應用創(chuàng)建成功后,根據(jù)業(yè)務場景編輯您的應用結構,包括定義數(shù)據(jù)表,字段以及分詞類型。詳細配置說明請參考字段類型和分詞類型。
本示例是針對文本文檔創(chuàng)建索引,創(chuàng)建了一個main數(shù)據(jù)表,采用常規(guī)的字段,如title,author,content等等,并使用中文基礎分詞。如下圖所示:
3.開通函數(shù)計算(Function Compute)
函數(shù)計算是一個事件驅動的全托管計算服務,用戶編寫代碼上傳到函數(shù)計算,然后通過SDK或者RESTful API來觸發(fā)執(zhí)行函數(shù),也可以通過云產品的事件來觸發(fā)執(zhí)行函數(shù)。具體開通方式請參考函數(shù)計算快速入門。
本示例開通函數(shù)服務后,在“華北2”區(qū)域新建名為“oss-fc-search”的服務,如下圖所示:
服務創(chuàng)建成功后,開始創(chuàng)建函數(shù)。將本文提供的java代碼,pom文件build成jar包上傳。
package SearchDemo;import com.aliyun.fc.runtime.*;
import com.aliyun.opensearch.DocumentClient;
import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.dependencies.com.google.common.collect.Maps;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
import com.aliyun.oss.OSSClient;
import com.aliyun.oss.model.OSSObject;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;import java.io.*;
import java.util.*;public class EventHandler implements StreamRequestHandler {private static final String OSS_ENDPOINT = "YourOSSEndpoint";private static final String OPENSEARCH_APP_NAME = "YourOpenSearchAppName";private static final String OPENSEARCH_HOST = "YourOpenSearchHost";private static final String OPENSEARCH_TABLE_NAME = "YourOpenSearchTableName";private static final String ACCESS_KEY_ID = "YourAccessKeyId";private static final String ACCESS_KEY_SECRET = "YourAccessSecretId";private static final String DOC_URL_FORMAT = "http://%s.%s/%s";private static final List<String> addEventList = Arrays.asList("ObjectCreated:PutObject", "ObjectCreated:PostObject");private static final List<String> updateEventList = Arrays.asList("ObjectCreated:AppendObject");private static final List<String> deleteEventList = Arrays.asList("ObjectRemoved:DeleteObject", "ObjectRemoved:DeleteObjects");@Overridepublic void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {/** Preparation* Init logger, oss client, open search document client.*/FunctionComputeLogger fcLogger = context.getLogger();OSSClient ossClient = getOSSClient(context);DocumentClient documentClient = getDocumentClient();/** Step 1* Read oss event from input stream.*/JSONObject ossEvent;StringBuilder inputBuilder = new StringBuilder();BufferedReader streamReader = null;try {streamReader = new BufferedReader(new InputStreamReader(inputStream));String line;while ((line = streamReader.readLine()) != null) {inputBuilder.append(line);}fcLogger.info("Read object event success.");} catch(Exception ex) {fcLogger.error(ex.getMessage());return;} finally{closeQuietly(streamReader, fcLogger);}ossEvent = JSONObject.fromObject(inputBuilder.toString());fcLogger.info("Getting event: " + ossEvent.toString());/** Step 2* Loop every events in oss event, and generate structured docs in json format.*/JSONArray events = ossEvent.getJSONArray("events");for(int i = 0; i < events.size(); i++) {// Get event name, source, oss object.JSONObject event = events.getJSONObject(i);String eventName = event.getString("eventName");JSONObject oss = event.getJSONObject("oss");// Get bucket name and file name for file identifier.JSONObject bucket = oss.getJSONObject("bucket");String bucketName = bucket.getString("name");JSONObject object = oss.getJSONObject("object");String fileName = object.getString("key");// Prepare fields for commit to open searchMap<String, Object> structuredDoc = Maps.newLinkedHashMap();BufferedReader objectReader = null;UUID uuid = new UUID(bucketName.hashCode(), fileName.hashCode());structuredDoc.put("identifier", uuid);try {// For delete event, delete by identifierif (deleteEventList.contains(eventName)) {documentClient.remove(structuredDoc);} else {OSSObject ossObject = ossClient.getObject(bucketName, fileName);// Non delete event, read file content and more field you needStringBuilder fileContentBuilder = new StringBuilder();objectReader = new BufferedReader(new InputStreamReader(ossObject.getObjectContent()));String contentLine;while ((contentLine = objectReader.readLine()) != null) {fileContentBuilder.append('\n' + contentLine);}fcLogger.info("Read object content success.");// You can put more fields according to your scenariostructuredDoc.put("title", fileName);structuredDoc.put("content", fileContentBuilder.toString());structuredDoc.put("subject", String.format(DOC_URL_FORMAT, bucketName, OSS_ENDPOINT, fileName));if (addEventList.contains(eventName)) {documentClient.add(structuredDoc);} else if (updateEventList.contains(eventName)) {documentClient.update(structuredDoc);}}} catch (Exception ex) {fcLogger.error(ex.getMessage());return;} finally {closeQuietly(objectReader, fcLogger);}}/** Step 3* Commit json docs string to open search*/try {OpenSearchResult osr = documentClient.commit(OPENSEARCH_APP_NAME, OPENSEARCH_TABLE_NAME);if(osr.getResult().equalsIgnoreCase("true")) {fcLogger.info("OSS Object commit to OpenSearch success.");} else {fcLogger.info("Fail to commit to OpenSearch.");}} catch (OpenSearchException ex) {fcLogger.error(ex.getMessage());return;} catch (OpenSearchClientException ex) {fcLogger.error(ex.getMessage());return;}}protected OSSClient getOSSClient(Context context) {Credentials creds = context.getExecutionCredentials();return new OSSClient(OSS_ENDPOINT, creds.getAccessKeyId(), creds.getAccessKeySecret(), creds.getSecurityToken());}protected DocumentClient getDocumentClient() {OpenSearch openSearch = new OpenSearch(ACCESS_KEY_ID, ACCESS_KEY_SECRET, OPENSEARCH_HOST);OpenSearchClient serviceClient = new OpenSearchClient(openSearch);return new DocumentClient(serviceClient);}protected void closeQuietly(BufferedReader reader, FunctionComputeLogger fcLogger) {try {if (reader != null) {reader.close();}} catch (Exception ex) {fcLogger.error(ex.getMessage());}}
}
pom.xml文件:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>YourGroupId</groupId><artifactId>YourArtifactid</artifactId><version>1.0-SNAPSHOT</version><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>1.6</source><target>1.6</target></configuration></plugin></plugins></build><dependencies><dependency><groupId>com.aliyun.fc.runtime</groupId><artifactId>fc-java-core</artifactId><version>1.0.0</version></dependency><dependency><groupId>com.aliyun.fc.runtime</groupId><artifactId>fc-java-event</artifactId><version>1.0.0</version></dependency><dependency><groupId>com.aliyun.oss</groupId><artifactId>aliyun-sdk-oss</artifactId><version>2.8.2</version></dependency><dependency><groupId>com.aliyun.opensearch</groupId><artifactId>aliyun-sdk-opensearch</artifactId><version>3.1.3</version></dependency></dependencies></project>
4.新建觸發(fā)器并授權,參考創(chuàng)建觸發(fā)器并授權
使用效果
1.在所有的服務、觸發(fā)器都創(chuàng)建好后,我們來看使用效果。首先準備兩個文本文檔(文檔內容如下),并上傳到OSS:
2.進入開放搜索控制臺,搜索測試:
搜索“杭州”,西湖.txt和阿里巴巴.txt都出現(xiàn)在搜索結果中,因為兩個文檔的內容中都包含“杭州”這個關鍵詞。
搜索“電子商務”, 返回一個結果,只有阿里巴巴.txt中包含“電子商務”
3.在OSS刪除文檔后,OpenSearch中的數(shù)據(jù)也全部刪除。
搜索“杭州”關鍵詞,沒有文檔返回。
總結
以上是生活随笔為你收集整理的实践篇:利用函数计算轻松构建全文检索系统的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。