sql查询涵盖的时段_涵盖的主题
sql查詢涵蓋的時段
涵蓋的主題: (Topics Covered:)
1. 什么是NLP? (1. What is NLP?)
- A changing field 不斷變化的領域
- Resources 資源資源
- Tools 工具類
- Python libraries Python庫
- Example applications 應用范例
- Ethics issues 道德問題
2. 使用NMF和SVD進行主題建模 (2. Topic Modeling with NMF and SVD)
part-1part-2 (click to follow the link to article)
第1 部分,第2部分 (單擊以跟隨文章鏈接)
- Stop words, stemming, & lemmatization 停用詞,詞干和詞形化
- Term-document matrix 術語文檔矩陣
- Topic Frequency-Inverse Document Frequency (TF-IDF) 主題頻率-逆文檔頻率(TF-IDF)
- Singular Value Decomposition (SVD) 奇異值分解(SVD)
- Non-negative Matrix Factorization (NMF) 非負矩陣分解(NMF)
- Truncated SVD, Randomized SVD 截斷SVD,隨機SVD
3. 使用樸素貝葉斯,邏輯回歸和ngram進行情感分類 (3. Sentiment classification with Naive Bayes, Logistic regression, and ngrams)
part -1(click to follow the link to article)
-1部分 (單擊以跟隨文章鏈接)
- Sparse matrix storage 稀疏矩陣存儲
- Counters 專柜
- the fastai library 法斯特圖書館
- Naive Bayes 樸素貝葉斯
- Logistic regression 邏輯回歸
- Ngrams 克
- Logistic regression with Naive Bayes features, with trigrams 具有樸素貝葉斯功能的邏輯回歸,帶有三字母組合
4.正則表達式(和重新訪問令牌化) (4. Regex (and re-visiting tokenization))
5.語言建模和情感分類與深度學習 (5. Language modeling & sentiment classification with deep learning)
- Language model 語言模型
- Transfer learning 轉移學習
- Sentiment classification 情感分類
6.使用RNN進行翻譯 (6. Translation with RNNs)
- Review Embeddings 查看嵌入
- Bleu metric 藍光指標
- Teacher Forcing 教師強迫
- Bidirectional 雙向的
- Attention 注意
7.使用Transformer架構進行翻譯 (7. Translation with the Transformer architecture)
- Transformer Model 變壓器型號
- Multi-head attention 多頭注意力
- Masking 掩蔽
- Label smoothing 標簽平滑
8. NLP中的偏見與道德 (8. Bias & ethics in NLP)
- bias in word embeddings 詞嵌入中的偏見
- types of bias 偏見類型
- attention economy 注意經濟
- drowning in fraudulent/fake info 淹沒在欺詐/虛假信息中
使用NMF和SVD進行主題建模:第2部分 (Topic Modeling with NMF and SVD : Part-2)
please find part-1 here: Topic Modeling with NMF and SVD
請在此處找到第1部分: 使用NMF和SVD進行主題建模
Let’s wrap up some loose ends from last time.
讓我們從上次總結一些松散的結局。
兩種文化 (The two cultures)
This “debate” captures the tension between two approaches:
這個“辯論”抓住了兩種方法之間的張力:
- modeling the underlying mechanism of a phenomena 建模現象的潛在機制
- using machine learning to predict outputs (without necessarily understanding the mechanisms that create them) 使用機器學習來預測輸出(不必了解創建它們的機制)
There was a research project (in 2007) that involved manually coding each of the above reactions. The scientist were determining if the final system could generate the same ouputs (in this case, levels in the blood of various substrates) as were observed in clinical studies.
有一個研究項目(2007年)涉及對上述每個React進行手動編碼。 科學家正在確定最終系統是否可以產生與臨床研究中觀察到的相同的輸出量(在這種情況下,是各種底物血液中的水平)。
The equation for each reaction could be quite complex:
每個React的方程式可能非常復雜:
This is an example of modeling the underlying mechanism, and is very different from a machine learning approach.Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2391141/
這是對基礎機制進行建模的示例,與機器學習方法有很大不同。 資料來源: https : //www.ncbi.nlm.nih.gov/pmc/articles/PMC2391141/
每個州最受歡迎的單詞 (The most popular word in each state)
A time to remove stop words
刪除停用詞的時間
因式分解與矩陣分解相似 (Factorization is analgous to matrix decomposition)
帶整數 (With Integers)
Multiplication:
乘法:
Factorization is the “opposite” of multiplication:Here, the factors have the nice property of being prime.Prime factorization is much harder than multiplication (which is good, because it’s the heart of encryption).
因子分解是乘法的“對立”:在這里,因子具有素數的優良特性。素因分解比乘法難得多(這很好,因為它是加密的核心)。
與矩陣 (With Matrices)
Matrix decompositions are a way of taking matrices apart (the “opposite” of matrix multiplication).Similarly, we use matrix decompositions to come up with matrices with nice properties.Taking matrices apart is harder than putting them together.
矩陣分解是將矩陣分開(矩陣乘法的“對面”)的一種方法。類似地,我們使用矩陣分解來得出具有良好屬性的矩陣。將矩陣分開比將它們放在一起要困難。
One application:
一個應用程序 :
What are the nice properties that matrices in an SVD decomposition have?
SVD分解中的矩陣有哪些好的屬性?
一些線性代數復習 (Some Linear Algebra Review)
矩陣向量乘法 (Matrix-vector multiplication)
takes a linear combination of the columns of A, using coefficients xhttp://matrixmultiplication.xyz/
使用系數x http://matrixmultiplication.xyz/對A列進行線性組合
矩陣矩陣乘法 (Matrix-matrix multiplication)
each column of C is a linear combination of columns of A, where the coefficients come from the corresponding column of C
C的每一列都是A的列的線性組合,其中系數來自C的對應列
(來源: NMF教程 ) 矩陣作為變換 ((source: NMF Tutorial)Matrices as Transformations)
The 3Blue 1Brown Essence of Linear Algebra videos are fantastic. They give a much more visual & geometric perspective on linear algreba than how it is typically taught. These videos are a great resource if you are a linear algebra beginner, or feel uncomfortable or rusty with the material.
線性代數的3Blue 1Brown 本質視頻非常棒。 他們對線性algreba的視覺和幾何透視比通常講授的要多得多。 如果您是線性代數初學者,或者對材料感到不舒服或生銹,這些視頻將是一個很好的資源。
Even if you are a linear algrebra pro, I still recommend these videos for a new perspective, and they are very well made.
即使您是線性algrebra專業人士,我仍然建議您以新角度觀看這些視頻,而且它們的制作精良。
In [2]:
在[2]中:
from IPython.display import YouTubeVideoYouTubeVideo("kYB8IZa5AuE")Excel中的英國文學SVD和NMF (British Literature SVD & NMF in Excel)
Data was downloaded from here
數據從這里下載
The code below was used to create the matrices which are displayed in the SVD and NMF of British Literature excel workbook. The data is intended to be viewed in Excel, I’ve just included the code here for thoroughness.
下面的代碼用于創建在英國文學excel工作簿的SVD和NMF中顯示的矩陣。 數據打算在Excel中查看,為了完整起見,這里僅包含代碼。
初始化,創建文檔術語矩陣 (Initializing, create document-term matrix)
In [2]:
在[2]中:
import numpy as npfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizerfrom sklearn import decompositionfrom glob import globimport osIn [3]:
在[3]中:
np.set_printoptions(suppress=True)In [46]:
在[46]中:
filenames = []for folder in ["british-fiction-corpus"]: #, "french-plays", "hugo-les-misérables"]:filenames.extend(glob("data/literature/" + folder + "/*.txt"))
In [47]:
在[47]中:
len(filenames)Out[47]:
出[47]:
27In [134]:
在[134]中:
vectorizer = TfidfVectorizer(input='filename', stop_words='english')dtm = vectorizer.fit_transform(filenames).toarray()
vocab = np.array(vectorizer.get_feature_names())
dtm.shape, len(vocab)
Out[134]:
出[134]:
((27, 55035), 55035)In [135]:
在[135]中:
[f.split("/")[3] for f in filenames]Out[135]:
出[135]:
['Sterne_Tristram.txt','Austen_Pride.txt',
'Thackeray_Pendennis.txt',
'ABronte_Agnes.txt',
'Austen_Sense.txt',
'Thackeray_Vanity.txt',
'Trollope_Barchester.txt',
'Fielding_Tom.txt',
'Dickens_Bleak.txt',
'Eliot_Mill.txt',
'EBronte_Wuthering.txt',
'Eliot_Middlemarch.txt',
'Fielding_Joseph.txt',
'ABronte_Tenant.txt',
'Austen_Emma.txt',
'Trollope_Prime.txt',
'CBronte_Villette.txt',
'CBronte_Jane.txt',
'Richardson_Clarissa.txt',
'CBronte_Professor.txt',
'Dickens_Hard.txt',
'Eliot_Adam.txt',
'Dickens_David.txt',
'Trollope_Phineas.txt',
'Richardson_Pamela.txt',
'Sterne_Sentimental.txt',
'Thackeray_Barry.txt']
NMF (NMF)
In [136]:
在[136]中:
clf = decomposition.NMF(n_components=10, random_state=1)W1 = clf.fit_transform(dtm)H1 = clf.components_
In [137]:
在[137]中:
num_top_words=8def show_topics(a):top_words = lambda t: [vocab[i] for i in np.argsort(t)[:-num_top_words-1:-1]]
topic_words = ([top_words(t) for t in a])return [' '.join(t) for t in topic_words]
In [138]:
在[138]中:
def get_all_topic_words(H):top_indices = lambda t: {i for i in np.argsort(t)[:-num_top_words-1:-1]}
topic_indices = [top_indices(t) for t in H]return sorted(set.union(*topic_indices))
In [139]:
在[139]中:
ind = get_all_topic_words(H1)In [140]:
在[140]中:
vocab[ind]Out[140]:
出[140]:
array(['adams', 'allworthy', 'bounderby', 'brandon', 'catherine', 'cathy','corporal', 'crawley', 'darcy', 'dashwood', 'did', 'earnshaw',
'edgar', 'elinor', 'emma', 'father', 'ferrars', 'finn', 'glegg',
'good', 'gradgrind', 'hareton', 'heathcliff', 'jennings', 'jones',
'joseph', 'know', 'lady', 'laura', 'like', 'linton', 'little', 'll',
'lopez', 'louisa', 'lyndon', 'maggie', 'man', 'marianne', 'miss',
'mr', 'mrs', 'old', 'osborne', 'pendennis', 'philip', 'phineas',
'quoth', 'said', 'sissy', 'sophia', 'sparsit', 'stephen', 'thought',
'time', 'tis', 'toby', 'tom', 'trim', 'tulliver', 'uncle', 'wakem',
'wharton', 'willoughby'],
dtype='<U31')
In [141]:
在[141]中:
show_topics(H1)Out[141]:
出[141]:
['mr said mrs miss emma darcy little know','said little like did time know thought good',
'adams jones said lady allworthy sophia joseph mr',
'elinor marianne dashwood jennings willoughby mrs brandon ferrars',
'maggie tulliver said tom glegg philip mr wakem',
'heathcliff linton hareton catherine earnshaw cathy edgar ll',
'toby said uncle father corporal quoth tis trim',
'phineas said mr lopez finn man wharton laura',
'said crawley lyndon pendennis old little osborne lady',
'bounderby gradgrind sparsit said mr sissy louisa stephen']
In [142]:
在[142]中:
W1.shape, H1[:, ind].shapeOut[142]:
出[142]:
((27, 10), (10, 64))導出為CSV (Export to CSVs)
In [72]:
在[72]中:
from IPython.display import FileLink, FileLinksIn [119]:
在[119]中:
np.savetxt("britlit_W.csv", W1, delimiter=",", fmt='%.14f')FileLink('britlit_W.csv')
Out[119]:
出[119]:
britlit_W.csv
britlit_W.csv
In [120]:
在[120]中:
np.savetxt("britlit_H.csv", H1[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_H.csv')
Out[120]:
出[120]:
britlit_H.csv
britlit_H.csv
In [131]:
在[131]中:
np.savetxt("britlit_raw.csv", dtm[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_raw.csv')
Out[131]:
出[131]:
britlit_raw.csv
britlit_raw.csv
In [121]:
在[121]中:
[str(word) for word in vocab[ind]]Out[121]:
出[121]:
['adams','allworthy',
'bounderby',
'brandon',
'catherine',
'cathy',
'corporal',
'crawley',
'darcy',
'dashwood',
'did',
'earnshaw',
'edgar',
'elinor',
'emma',
'father',
'ferrars',
'finn',
'glegg',
'good',
'gradgrind',
'hareton',
'heathcliff',
'jennings',
'jones',
'joseph',
'know',
'lady',
'laura',
'like',
'linton',
'little',
'll',
'lopez',
'louisa',
'lyndon',
'maggie',
'man',
'marianne',
'miss',
'mr',
'mrs',
'old',
'osborne',
'pendennis',
'philip',
'phineas',
'quoth',
'said',
'sissy',
'sophia',
'sparsit',
'stephen',
'thought',
'time',
'tis',
'toby',
'tom',
'trim',
'tulliver',
'uncle',
'wakem',
'wharton',
'willoughby']
SVD (SVD)
In [143]:
在[143]中:
U, s, V = decomposition.randomized_svd(dtm, 10)In [144]:
在[144]中:
ind = get_all_topic_words(V)In [145]:
在[145]中:
len(ind)Out[145]:
出[145]:
52In [146]:
在[146]中:
vocab[ind]Out[146]:
出[146]:
array(['adams', 'allworthy', 'bounderby', 'bretton', 'catherine','crimsworth', 'darcy', 'dashwood', 'did', 'elinor', 'elton', 'emma',
'finn', 'fleur', 'glegg', 'good', 'gradgrind', 'hareton', 'hath',
'heathcliff', 'hunsden', 'jennings', 'jones', 'joseph', 'knightley',
'know', 'lady', 'linton', 'little', 'lopez', 'louisa', 'lydgate',
'madame', 'maggie', 'man', 'marianne', 'miss', 'monsieur', 'mr',
'mrs', 'pelet', 'philip', 'phineas', 'said', 'sissy', 'sophia',
'sparsit', 'toby', 'tom', 'tulliver', 'uncle', 'weston'],
dtype='<U31')
In [147]:
在[147]中:
show_topics(H1)Out[147]:
出[147]:
['mr said mrs miss emma darcy little know','said little like did time know thought good',
'adams jones said lady allworthy sophia joseph mr',
'elinor marianne dashwood jennings willoughby mrs brandon ferrars',
'maggie tulliver said tom glegg philip mr wakem',
'heathcliff linton hareton catherine earnshaw cathy edgar ll',
'toby said uncle father corporal quoth tis trim',
'phineas said mr lopez finn man wharton laura',
'said crawley lyndon pendennis old little osborne lady',
'bounderby gradgrind sparsit said mr sissy louisa stephen']
In [148]:
在[148]中:
np.savetxt("britlit_U.csv", U, delimiter=",", fmt='%.14f')FileLink('britlit_U.csv')
Out[148]:
出[148]:
britlit_U.csv
britlit_U.csv
In [149]:
在[149]中:
np.savetxt("britlit_V.csv", V[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_V.csv')
Out[149]:
出[149]:
britlit_V.csv
britlit_V.csv
In [150]:
在[150]中:
np.savetxt("britlit_raw_svd.csv", dtm[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_raw_svd.csv')
Out[150]:
出[150]:
britlit_raw_svd.csv
britlit_raw_svd.csv
In [151]:
在[151]中:
np.savetxt("britlit_S.csv", np.diag(s), delimiter=",", fmt='%.14f')FileLink('britlit_S.csv')
Out[151]:
出[151]:
britlit_S.csv
britlit_S.csv
In [152]:
在[152]中:
[str(word) for word in vocab[ind]]Out[152]:
出[152]:
['adams','allworthy',
'bounderby',
'bretton',
'catherine',
'crimsworth',
'darcy',
'dashwood',
'did',
'elinor',
'elton',
'emma',
'finn',
'fleur',
'glegg',
'good',
'gradgrind',
'hareton',
'hath',
'heathcliff',
'hunsden',
'jennings',
'jones',
'joseph',
'knightley',
'know',
'lady',
'linton',
'little',
'lopez',
'louisa',
'lydgate',
'madame',
'maggie',
'man',
'marianne',
'miss',
'monsieur',
'mr',
'mrs',
'pelet',
'philip',
'phineas',
'said',
'sissy',
'sophia',
'sparsit',
'toby',
'tom',
'tulliver',
'uncle',
'weston']
隨機SVD可加快速度 (Randomized SVD offers a speed up)
One way to address this is to use randomized SVD. In the below chart, the error is the difference between A — U S V, that is, what you’ve failed to capture in your decomposition:
解決此問題的一種方法是使用隨機SVD。 在下圖中,錯誤是A — U S V之間的差,即您在分解中未能捕獲的內容:
For more on randomized SVD, check out my PyBay 2017 talk.For significantly more on randomized SVD, check out the Computational Linear Algebra course.
有關隨機SVD的更多信息,請查看我的PyBay 2017演講 。有關隨機SVD的更多信息,請查看計算線性代數課程 。
完整與精簡SVD (Full vs Reduced SVD)
Remember how we were calling np.linalg.svd(vectors, full_matrices=False)? We set full_matrices=False to calculate the reduced SVD. For the full SVD, both U and V are square matrices, where the extra columns in U form an orthonormal basis (but zero out when multiplied by extra rows of zeros in S).
還記得我們如何調用np.linalg.svd(vectors, full_matrices=False)嗎? 我們設置full_matrices=False來計算簡化的SVD。 對于完整的SVD,U和V均為平方矩陣,其中U中的額外列構成正交基(但與S中的額外零行相乘則為零)。
Diagrams from Trefethen:
來自Trefethen的圖表:
結束 (End)
學分: (Credits:)
https://www.fast.ai/
https://www.fast.ai/
翻譯自: https://medium.com/ai-in-plain-english/topics-covered-7feba459180f
sql查詢涵蓋的時段
總結
以上是生活随笔為你收集整理的sql查询涵盖的时段_涵盖的主题的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 后端存储Base64码传输的图片
- 下一篇: 软件接口测试是什么?怎么测?