當前位置：首頁 > 运维知识 > windows >内容正文

windows

Kernel Memory 入门系列：文档的管理

發布時間：2023/12/29 windows 19 coder

生活随笔收集整理的這篇文章主要介紹了 Kernel Memory 入门系列：文档的管理小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Kernel Memory 入門系列：文檔的管理

在Quick Start中我們了解到如何快速直接地上傳文檔。當時實際中，往往會面臨更多的問題，例如文檔如何更新，如何劃定查詢范圍等等。這里我們將詳細介紹在Kernel Memory文檔的管理。

使用Document管理一組文件

當我們需要批量上傳一組文件的時候，可以使用Document來管理。

var document = new Document();
document.AddFile("./sample-SK-Readme.pdf");
document.AddFile("./sample-KM-Readme.md");
await memory.ImportDocumentAsync(document);

其中Document 作為一個對象，可以將多個文件歸結到一起，可以自行指定對應的DocumentId，如果不指定的話，會生成一個隨機的DocumentId，這個DocumentId后續可以用來查詢文檔的處理狀態或者用于更新刪除文檔。

后續的使用和管理，將會以Document為基本的文檔單位進行管理。

使用Tag進行文檔標記

當我們需要對上傳的文檔進行范圍劃定時，可以使用Tag來進行標記。Tag可以理解為一個文檔的屬性，可以充分的自定義，例如標記文檔的類型、標記文檔的來源、上傳的用戶、所屬的項目、所屬的領域等等。

var document = new Document();
document.AddFile("./sample-SK-Readme.pdf");
document.AddTag("type", "pdf");
document.AddTag("domain", "llm");
document.AddTag("user", "xbotter");
await memory.ImportDocumentAsync(document);

如果導入的是單文件，或者文件流的話，可以通過另外一種方式來添加Tag。

var tags = new TagCollection();
tags.Add("type", "pdf");
tags.Add("domain", "llm");
tags.Add("user", "xbotter");
await memory.ImportFileAsync("./sample-SK-Readme.pdf", tags: tags);

同樣也適用于文本和網頁的導入：

var tags = new TagCollection();
await memory.ImportTextAsync("這是一段文本", tags: tags);

await memory.ImportUrlAsync("https://www.github.com", tags: tags);

檢索時進行篩選

使用Tag標記的最大用途就是在檢索時候進行范圍的篩選，例如我們可以指定，僅在所有pdf文檔范圍內搜索：

await memory.AskAsync("What's the SK?", filters: MemoryFilters.ByTag("type", "pdf"));

當然也可以指定文檔進行篩選：

await memory.AskAsync("What's the SK?", filters: MemoryFilters.ByDocument("documentId"));

復雜篩選條件

當我們需要復雜的篩選條件時，可以使用MemoryFilters來實現And和Or的組合方式。
MemoryFilters多次的添加ByTag條件，表示為And的關系。

await memory.AskAsync("What's the SK?", filters: MemoryFilters.ByTag("type", "pdf")
                                                              .ByTag("domain", "llm"));

添加多個MemoryFilters，表示為Or的關系。

await memory.AskAsync("What's the SK?", filters: new List<MemoryFilter>() {
                                                MemoryFilters.ByTag("type", "pdf"),
                                                MemoryFilters.ByTag("domain", "llm")
                                            });

更新文檔

前面提及Document概念的時候已經提到，DocumentId用來指定一個文檔，當我們需要更新文檔的時候，可以直接指明DocumentId，然后上傳新的文檔即可。

var document = new Document(docId);
document.AddFile("./sample-SK-Readme.pdf");
await memory.ImportDocumentAsync(document);

此時，Kernel Memory會自動將原有的文檔進行替換，實現文檔的更新。

刪除文檔

當我們需要刪除文檔的時候，可以使用DeleteDocumentAsync方法，指定DocumentId即可。

await memory.DeleteDocumentAsync(docId);

使用Index進行隔離

上傳文檔和搜索的時候，另外一個需要指定的參數是index，index在向量存儲中，可以理解為一個命名空間，可以用來隔離不同的文檔，而且在檢索的時候，也是無法跨index進行檢索的。

當上傳和檢索文檔時未指定index的時候，會使用默認的index。

參考

SECURITY_FILTERS

總結

以上是生活随笔為你收集整理的Kernel Memory 入门系列：文档的管理的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。