當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

sql查询涵盖的时段_涵盖的主题

發布時間：2023/12/14 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 sql查询涵盖的时段_涵盖的主题小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

sql查詢涵蓋的時段

涵蓋的主題： (Topics Covered:)

1. 什么是NLP？ (1. What is NLP?)

A changing field
不斷變化的領域
Resources
資源資源
Tools
工具類
Python libraries
Python庫
Example applications
應用范例
Ethics issues
道德問題

2. 使用NMF和SVD進行主題建模 (2. Topic Modeling with NMF and SVD)

part-1part-2 (click to follow the link to article)

第1 部分，第2部分 (單擊以跟隨文章鏈接)

Stop words, stemming, & lemmatization
停用詞，詞干和詞形化
Term-document matrix
術語文檔矩陣
Topic Frequency-Inverse Document Frequency (TF-IDF)
主題頻率-逆文檔頻率(TF-IDF)
Singular Value Decomposition (SVD)
奇異值分解(SVD)
Non-negative Matrix Factorization (NMF)
非負矩陣分解(NMF)
Truncated SVD, Randomized SVD
截斷SVD，隨機SVD

3. 使用樸素貝葉斯，邏輯回歸和ngram進行情感分類 (3. Sentiment classification with Naive Bayes, Logistic regression, and ngrams)

part -1(click to follow the link to article)

-1部分 (單擊以跟隨文章鏈接)

Sparse matrix storage
稀疏矩陣存儲
Counters
專柜
the fastai library
法斯特圖書館
Naive Bayes
樸素貝葉斯
Logistic regression
邏輯回歸
Ngrams
克
Logistic regression with Naive Bayes features, with trigrams
具有樸素貝葉斯功能的邏輯回歸，帶有三字母組合

4.正則表達式(和重新訪問令牌化) (4. Regex (and re-visiting tokenization))

5.語言建模和情感分類與深度學習 (5. Language modeling & sentiment classification with deep learning)

Language model
語言模型
Transfer learning
轉移學習
Sentiment classification
情感分類

6.使用RNN進行翻譯 (6. Translation with RNNs)

Review Embeddings
查看嵌入
Bleu metric
藍光指標
Teacher Forcing
教師強迫
Bidirectional
雙向的
Attention
注意

7.使用Transformer架構進行翻譯 (7. Translation with the Transformer architecture)

Transformer Model
變壓器型號
Multi-head attention
多頭注意力
Masking
掩蔽
Label smoothing
標簽平滑

8. NLP中的偏見與道德 (8. Bias & ethics in NLP)

bias in word embeddings
詞嵌入中的偏見
types of bias
偏見類型
attention economy
注意經濟
drowning in fraudulent/fake info
淹沒在欺詐/虛假信息中

使用NMF和SVD進行主題建模：第2部分 (Topic Modeling with NMF and SVD : Part-2)

please find part-1 here: Topic Modeling with NMF and SVD

請在此處找到第1部分：使用NMF和SVD進行主題建模

Let’s wrap up some loose ends from last time.

讓我們從上次總結一些松散的結局。

兩種文化 (The two cultures)

This “debate” captures the tension between two approaches:

這個“辯論”抓住了兩種方法之間的張力：

modeling the underlying mechanism of a phenomena
建模現象的潛在機制
using machine learning to predict outputs (without necessarily understanding the mechanisms that create them)
使用機器學習來預測輸出(不必了解創建它們的機制)

There was a research project (in 2007) that involved manually coding each of the above reactions. The scientist were determining if the final system could generate the same ouputs (in this case, levels in the blood of various substrates) as were observed in clinical studies.

有一個研究項目(2007年)涉及對上述每個React進行手動編碼。科學家正在確定最終系統是否可以產生與臨床研究中觀察到的相同的輸出量(在這種情況下，是各種底物血液中的水平)。

The equation for each reaction could be quite complex:

每個React的方程式可能非常復雜：

This is an example of modeling the underlying mechanism, and is very different from a machine learning approach.Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2391141/

這是對基礎機制進行建模的示例，與機器學習方法有很大不同。資料來源： https : //www.ncbi.nlm.nih.gov/pmc/articles/PMC2391141/

每個州最受歡迎的單詞 (The most popular word in each state)

A time to remove stop words

刪除停用詞的時間

因式分解與矩陣分解相似 (Factorization is analgous to matrix decomposition)

帶整數 (With Integers)

Multiplication:

乘法：

Factorization is the “opposite” of multiplication:Here, the factors have the nice property of being prime.Prime factorization is much harder than multiplication (which is good, because it’s the heart of encryption).

因子分解是乘法的“對立”：在這里，因子具有素數的優良特性。素因分解比乘法難得多(這很好，因為它是加密的核心)。

與矩陣 (With Matrices)

Matrix decompositions are a way of taking matrices apart (the “opposite” of matrix multiplication).Similarly, we use matrix decompositions to come up with matrices with nice properties.Taking matrices apart is harder than putting them together.

矩陣分解是將矩陣分開(矩陣乘法的“對面”)的一種方法。類似地，我們使用矩陣分解來得出具有良好屬性的矩陣。將矩陣分開比將它們放在一起要困難。

One application:

一個應用程序：

What are the nice properties that matrices in an SVD decomposition have?

SVD分解中的矩陣有哪些好的屬性？

一些線性代數復習 (Some Linear Algebra Review)

矩陣向量乘法 (Matrix-vector multiplication)

takes a linear combination of the columns of A, using coefficients xhttp://matrixmultiplication.xyz/

使用系數x http://matrixmultiplication.xyz/對A列進行線性組合

矩陣矩陣乘法 (Matrix-matrix multiplication)

each column of C is a linear combination of columns of A, where the coefficients come from the corresponding column of C

C的每一列都是A的列的線性組合，其中系數來自C的對應列

(來源： NMF教程 ) 矩陣作為變換 ((source: NMF Tutorial)Matrices as Transformations)

The 3Blue 1Brown Essence of Linear Algebra videos are fantastic. They give a much more visual & geometric perspective on linear algreba than how it is typically taught. These videos are a great resource if you are a linear algebra beginner, or feel uncomfortable or rusty with the material.

線性代數的3Blue 1Brown 本質視頻非常棒。他們對線性algreba的視覺和幾何透視比通常講授的要多得多。如果您是線性代數初學者，或者對材料感到不舒服或生銹，這些視頻將是一個很好的資源。

Even if you are a linear algrebra pro, I still recommend these videos for a new perspective, and they are very well made.

即使您是線性algrebra專業人士，我仍然建議您以新角度觀看這些視頻，而且它們的制作精良。

In [2]:

在[2]中：

from IPython.display import YouTubeVideoYouTubeVideo("kYB8IZa5AuE")

Excel中的英國文學SVD和NMF (British Literature SVD & NMF in Excel)

Data was downloaded from here

數據從這里下載

The code below was used to create the matrices which are displayed in the SVD and NMF of British Literature excel workbook. The data is intended to be viewed in Excel, I’ve just included the code here for thoroughness.

下面的代碼用于創建在英國文學excel工作簿的SVD和NMF中顯示的矩陣。數據打算在Excel中查看，為了完整起見，這里僅包含代碼。

初始化，創建文檔術語矩陣 (Initializing, create document-term matrix)

In [2]:

在[2]中：

import numpy as npfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizerfrom sklearn import decompositionfrom glob import globimport os

In [3]:

在[3]中：

np.set_printoptions(suppress=True)

In [46]:

在[46]中：

filenames = []for folder in ["british-fiction-corpus"]: #, "french-plays", "hugo-les-misérables"]:
filenames.extend(glob("data/literature/" + folder + "/*.txt"))

In [47]:

在[47]中：

len(filenames)

Out[47]:

出[47]：

In [134]:

在[134]中：

vectorizer = TfidfVectorizer(input='filename', stop_words='english')
dtm = vectorizer.fit_transform(filenames).toarray()
vocab = np.array(vectorizer.get_feature_names())
dtm.shape, len(vocab)

Out[134]:

出[134]：

((27, 55035), 55035)

In [135]:

在[135]中：

[f.split("/")[3] for f in filenames]

Out[135]:

出[135]：

['Sterne_Tristram.txt',
'Austen_Pride.txt',
'Thackeray_Pendennis.txt',
'ABronte_Agnes.txt',
'Austen_Sense.txt',
'Thackeray_Vanity.txt',
'Trollope_Barchester.txt',
'Fielding_Tom.txt',
'Dickens_Bleak.txt',
'Eliot_Mill.txt',
'EBronte_Wuthering.txt',
'Eliot_Middlemarch.txt',
'Fielding_Joseph.txt',
'ABronte_Tenant.txt',
'Austen_Emma.txt',
'Trollope_Prime.txt',
'CBronte_Villette.txt',
'CBronte_Jane.txt',
'Richardson_Clarissa.txt',
'CBronte_Professor.txt',
'Dickens_Hard.txt',
'Eliot_Adam.txt',
'Dickens_David.txt',
'Trollope_Phineas.txt',
'Richardson_Pamela.txt',
'Sterne_Sentimental.txt',
'Thackeray_Barry.txt']

NMF (NMF)

In [136]:

在[136]中：

clf = decomposition.NMF(n_components=10, random_state=1)W1 = clf.fit_transform(dtm)
H1 = clf.components_

In [137]:

在[137]中：

num_top_words=8def show_topics(a):
top_words = lambda t: [vocab[i] for i in np.argsort(t)[:-num_top_words-1:-1]]
topic_words = ([top_words(t) for t in a])return [' '.join(t) for t in topic_words]

In [138]:

在[138]中：

def get_all_topic_words(H):
top_indices = lambda t: {i for i in np.argsort(t)[:-num_top_words-1:-1]}
topic_indices = [top_indices(t) for t in H]return sorted(set.union(*topic_indices))

In [139]:

在[139]中：

ind = get_all_topic_words(H1)

In [140]:

在[140]中：

vocab[ind]

Out[140]:

出[140]：

array(['adams', 'allworthy', 'bounderby', 'brandon', 'catherine', 'cathy',
'corporal', 'crawley', 'darcy', 'dashwood', 'did', 'earnshaw',
'edgar', 'elinor', 'emma', 'father', 'ferrars', 'finn', 'glegg',
'good', 'gradgrind', 'hareton', 'heathcliff', 'jennings', 'jones',
'joseph', 'know', 'lady', 'laura', 'like', 'linton', 'little', 'll',
'lopez', 'louisa', 'lyndon', 'maggie', 'man', 'marianne', 'miss',
'mr', 'mrs', 'old', 'osborne', 'pendennis', 'philip', 'phineas',
'quoth', 'said', 'sissy', 'sophia', 'sparsit', 'stephen', 'thought',
'time', 'tis', 'toby', 'tom', 'trim', 'tulliver', 'uncle', 'wakem',
'wharton', 'willoughby'],
dtype='<U31')

In [141]:

在[141]中：

show_topics(H1)

Out[141]:

出[141]：

['mr said mrs miss emma darcy little know',
'said little like did time know thought good',
'adams jones said lady allworthy sophia joseph mr',
'elinor marianne dashwood jennings willoughby mrs brandon ferrars',
'maggie tulliver said tom glegg philip mr wakem',
'heathcliff linton hareton catherine earnshaw cathy edgar ll',
'toby said uncle father corporal quoth tis trim',
'phineas said mr lopez finn man wharton laura',
'said crawley lyndon pendennis old little osborne lady',
'bounderby gradgrind sparsit said mr sissy louisa stephen']

In [142]:

在[142]中：

W1.shape, H1[:, ind].shape

Out[142]:

出[142]：

((27, 10), (10, 64))

導出為CSV (Export to CSVs)

In [72]:

在[72]中：

from IPython.display import FileLink, FileLinks

In [119]:

在[119]中：

np.savetxt("britlit_W.csv", W1, delimiter=",", fmt='%.14f')
FileLink('britlit_W.csv')

Out[119]:

出[119]：

britlit_W.csv

In [120]:

在[120]中：

np.savetxt("britlit_H.csv", H1[:,ind], delimiter=",", fmt='%.14f')
FileLink('britlit_H.csv')

Out[120]:

出[120]：

britlit_H.csv

In [131]:

在[131]中：

np.savetxt("britlit_raw.csv", dtm[:,ind], delimiter=",", fmt='%.14f')
FileLink('britlit_raw.csv')

Out[131]:

出[131]：

britlit_raw.csv

In [121]:

在[121]中：

[str(word) for word in vocab[ind]]

Out[121]:

出[121]：

['adams',
'allworthy',
'bounderby',
'brandon',
'catherine',
'cathy',
'corporal',
'crawley',
'darcy',
'dashwood',
'did',
'earnshaw',
'edgar',
'elinor',
'emma',
'father',
'ferrars',
'finn',
'glegg',
'good',
'gradgrind',
'hareton',
'heathcliff',
'jennings',
'jones',
'joseph',
'know',
'lady',
'laura',
'like',
'linton',
'little',
'll',
'lopez',
'louisa',
'lyndon',
'maggie',
'man',
'marianne',
'miss',
'mr',
'mrs',
'old',
'osborne',
'pendennis',
'philip',
'phineas',
'quoth',
'said',
'sissy',
'sophia',
'sparsit',
'stephen',
'thought',
'time',
'tis',
'toby',
'tom',
'trim',
'tulliver',
'uncle',
'wakem',
'wharton',
'willoughby']

SVD (SVD)

In [143]:

在[143]中：

U, s, V = decomposition.randomized_svd(dtm, 10)

In [144]:

在[144]中：

ind = get_all_topic_words(V)

In [145]:

在[145]中：

len(ind)

Out[145]:

出[145]：

In [146]:

在[146]中：

vocab[ind]

Out[146]:

出[146]：

array(['adams', 'allworthy', 'bounderby', 'bretton', 'catherine',
'crimsworth', 'darcy', 'dashwood', 'did', 'elinor', 'elton', 'emma',
'finn', 'fleur', 'glegg', 'good', 'gradgrind', 'hareton', 'hath',
'heathcliff', 'hunsden', 'jennings', 'jones', 'joseph', 'knightley',
'know', 'lady', 'linton', 'little', 'lopez', 'louisa', 'lydgate',
'madame', 'maggie', 'man', 'marianne', 'miss', 'monsieur', 'mr',
'mrs', 'pelet', 'philip', 'phineas', 'said', 'sissy', 'sophia',
'sparsit', 'toby', 'tom', 'tulliver', 'uncle', 'weston'],
dtype='<U31')

In [147]:

在[147]中：

show_topics(H1)

Out[147]:

出[147]：

In [148]:

在[148]中：

np.savetxt("britlit_U.csv", U, delimiter=",", fmt='%.14f')
FileLink('britlit_U.csv')

Out[148]:

出[148]：

britlit_U.csv

In [149]:

在[149]中：

np.savetxt("britlit_V.csv", V[:,ind], delimiter=",", fmt='%.14f')
FileLink('britlit_V.csv')

Out[149]:

出[149]：

britlit_V.csv

In [150]:

在[150]中：

np.savetxt("britlit_raw_svd.csv", dtm[:,ind], delimiter=",", fmt='%.14f')
FileLink('britlit_raw_svd.csv')

Out[150]:

出[150]：

britlit_raw_svd.csv

In [151]:

在[151]中：

np.savetxt("britlit_S.csv", np.diag(s), delimiter=",", fmt='%.14f')
FileLink('britlit_S.csv')

Out[151]:

出[151]：

britlit_S.csv

In [152]:

在[152]中：

[str(word) for word in vocab[ind]]

Out[152]:

出[152]：

['adams',
'allworthy',
'bounderby',
'bretton',
'catherine',
'crimsworth',
'darcy',
'dashwood',
'did',
'elinor',
'elton',
'emma',
'finn',
'fleur',
'glegg',
'good',
'gradgrind',
'hareton',
'hath',
'heathcliff',
'hunsden',
'jennings',
'jones',
'joseph',
'knightley',
'know',
'lady',
'linton',
'little',
'lopez',
'louisa',
'lydgate',
'madame',
'maggie',
'man',
'marianne',
'miss',
'monsieur',
'mr',
'mrs',
'pelet',
'philip',
'phineas',
'said',
'sissy',
'sophia',
'sparsit',
'toby',
'tom',
'tulliver',
'uncle',
'weston']

隨機SVD可加快速度 (Randomized SVD offers a speed up)

One way to address this is to use randomized SVD. In the below chart, the error is the difference between A — U S V, that is, what you’ve failed to capture in your decomposition:

解決此問題的一種方法是使用隨機SVD。在下圖中，錯誤是A — U S V之間的差，即您在分解中未能捕獲的內容：

For more on randomized SVD, check out my PyBay 2017 talk.For significantly more on randomized SVD, check out the Computational Linear Algebra course.

有關隨機SVD的更多信息，請查看我的PyBay 2017演講。有關隨機SVD的更多信息，請查看計算線性代數課程。

完整與精簡SVD (Full vs Reduced SVD)

Remember how we were calling np.linalg.svd(vectors, full_matrices=False)? We set full_matrices=False to calculate the reduced SVD. For the full SVD, both U and V are square matrices, where the extra columns in U form an orthonormal basis (but zero out when multiplied by extra rows of zeros in S).

還記得我們如何調用np.linalg.svd(vectors, full_matrices=False)嗎？我們設置full_matrices=False來計算簡化的SVD。對于完整的SVD，U和V均為平方矩陣，其中U中的額外列構成正交基(但與S中的額外零行相乘則為零)。

Diagrams from Trefethen:

來自Trefethen的圖表：

結束 (End)

學分： (Credits:)

https://www.fast.ai/

翻譯自: https://medium.com/ai-in-plain-english/topics-covered-7feba459180f