當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

nlp gpt论文_GPT-3：NLP镇的最新动态

發(fā)布時(shí)間：2023/11/29 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 nlp gpt论文_GPT-3：NLP镇的最新动态小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

nlp gpt論文

什么是GPT-3？ (What is GPT-3?)

The launch of Open AI’s 3rd generation of the pre-trained language model, GPT-3 (Generative Pre-training Transformer) has got the data science fraternity buzzing with excitement!

Open AI的第三代預(yù)訓(xùn)練語(yǔ)言模型GPT-3(生成式預(yù)訓(xùn)練變壓器)的發(fā)布使數(shù)據(jù)科學(xué)界的關(guān)注度高漲！

The world of Language Models (LM) is quite fascinating. To give a brief introduction, these models learn the probabilities of a sequence of words that occur in a commonly spoken language (say, English) and predict the next possible word in that sequence. They are essential for numerous NLP tasks like:

語(yǔ)言模型(LM)的世界非常迷人。為了簡(jiǎn)要介紹，這些模型學(xué)習(xí)了以常用口語(yǔ)(例如英語(yǔ))出現(xiàn)的單詞序列的概率，并預(yù)測(cè)了該序列中的下一個(gè)可能單詞。它們對(duì)于許多NLP任務(wù)至關(guān)重要，例如：

Language Translation
語(yǔ)言翻譯
Text Classification
文字分類
Sentiment Extraction
情感提取
Reading Comprehension
閱讀理解
Named Entity Recognition
命名實(shí)體識(shí)別
Question Answer Systems
問(wèn)答系統(tǒng)
News Article Generation, etc
新聞文章生成等

They’ve become immensely popular since the release of BERT by Google, with a host of companies competing to build the next big thing in the NLP domain!

自Google發(fā)行BERT以來(lái)，它們已經(jīng)變得非常受歡迎，許多公司競(jìng)相在NLP領(lǐng)域打造下一個(gè)重要產(chǎn)品！

Open AI’s GPT-3 is the largest Language Model having 175 BN parameters, 10x more than that of Microsoft’s Turing NLG

Open AI的GPT-3是最大的語(yǔ)言模型，具有175個(gè)BN參數(shù)，是Microsoft Turing NLG的10倍以上

Open AI has been in the race for a long time now. The capabilities, features and limitations of their latest edition, GPT-3, has been described in a detailed research paper. Its predecessor GPT-2 (released in Feb 2019) was trained on 40GB of text data and had 1.5 BN parameters. In comparison, GPT-3 has a whopping 175 BN parameters, 10 times more than the next largest LM, the Turing NLG, developed by Microsoft with 17 BN parameters!

開(kāi)放式AI競(jìng)賽已經(jīng)有很長(zhǎng)時(shí)間了。最新研究版本GPT-3的功能，特性和局限性已在一份詳細(xì)的研究論文中進(jìn)行了描述。它的前身GPT-2 (于2019年2月發(fā)布)接受了40GB文本數(shù)據(jù)的訓(xùn)練，參數(shù)為1.5BN。相比之下，GPT-3的參數(shù)高達(dá)175個(gè)BN，是第二大LM圖靈NLG的十倍，圖靈NLG是由微軟開(kāi)發(fā)的具有17個(gè)BN參數(shù)的！

Fig-1: Comparison of all available language models (LMs) parameter wise圖1：所有可用語(yǔ)言模型(LM)參數(shù)明智的比較

GPT-3 is based on the concepts of transformer and attention similar to GPT-2. It has been trained on a large and variety of data like Common Crawl, web texts, books and Wikipedia, based on the tokens from each data. Prior to training the model, the average quality of the datasets has been improved in 3 steps.

GPT-3基于變壓器和注意力的概念類似于GPT-2。根據(jù)每個(gè)數(shù)據(jù)的標(biāo)記，已經(jīng)針對(duì)大量數(shù)據(jù)(例如Common Crawl ，Web文本，書籍和Wikipedia)進(jìn)行了培訓(xùn)。在訓(xùn)練模型之前，數(shù)據(jù)集的平均質(zhì)量已通過(guò)3個(gè)步驟得到了改善。

The following table shows the training corpus of GPT-3:

下表顯示了GPT-3的訓(xùn)練語(yǔ)料庫(kù)：

GPT-3 has variants in terms of

GPT-3在以下方面具有變體

Sizes (Parameters and Layers)
大小(參數(shù)和層)
Architectures
建筑學(xué)
Learning hyper-parameters (batch size in tokens and learning rate) ranging from 125 MN to 175 BN parameters
學(xué)習(xí)超參數(shù)(令牌的批量大小和學(xué)習(xí)率)范圍從125 MN到175 BN參數(shù)

“The largest version of GPT-3 has 175 BN Parameters, 96 Attention Layers and 3.2 MN Batch Size”

“ GPT-3的最大版本具有175 BN參數(shù)，96個(gè)注意層和3.2 MN批處理大小”

Here are the details of the different variants of GPT-3 model:

以下是GPT-3模型的不同變體的詳細(xì)信息：

Fig-2: Details of variants of the GPT-3 model圖2：GPT-3模型的變體詳細(xì)信息

它能做什么？ (What can it do?)

Many of the NLP tasks discussed in this blog can be performed by GPT-3 without any gradient, parameter updates or fine-tuning. This makes it a Task-Agnostic Model as it can perform tasks without any or very few prompts or examples or demonstrations called shots.

GPT-3可以執(zhí)行此博客中討論的許多NLP任務(wù)，而無(wú)需進(jìn)行任何漸變，參數(shù)更新或微調(diào)。這使其成為與任務(wù)無(wú)關(guān)的模型，因?yàn)樗梢詧?zhí)行任務(wù)而無(wú)需任何或很少的提示，示例或稱為鏡頭的演示。

The following image displays a Zero / One / Few-Shot based task accuracy comparison for various sizes of the model (in terms of parameters) for a simple task to remove random symbols from a word with the number of in-context examples ranging between 10 to 100.

下圖顯示了針對(duì)零模型，零模型，零模型的任務(wù)準(zhǔn)確性比較，該模型針對(duì)各種大小的模型(就參數(shù)而言)，以完成一項(xiàng)簡(jiǎn)單任務(wù)，以從單詞中刪除隨機(jī)符號(hào)，上下文中示例的數(shù)量在10個(gè)之間到100。

Fig-3: Zero / One / Few-Shot based task accuracy comparison for models of different sizes圖3：基于零/一/少量射擊的任務(wù)精度比較，用于不同大小的模型

“假新聞”難題 (The “Fake News” Conundrum)

Earlier, the release of the largest model of GPT-2 was briefly stalled due to a controversial debate of it being capable of generating fake news. It was later published on Colab notebooks. In recent times, however, this has been quite common and the real news themselves have been hard to believe!

早些時(shí)候，由于有爭(zhēng)議的關(guān)于GPT-2能夠產(chǎn)生假新聞的爭(zhēng)議，GPT-2的最大型號(hào)的發(fā)布暫時(shí)停止了。后來(lái)發(fā)表在Colab筆記本上。但是，最近這種情況已經(jīng)很普遍了，真正的新聞本身很難讓人相信！

The fake news generated by GPT-3 has been so difficult to distinguish from the real ones, and in one of the experiments, the results show that only 50% of the fake news could actually be detected!

由GPT-3生成的虛假新聞很難與真實(shí)新聞區(qū)分開(kāi)，在其中一項(xiàng)實(shí)驗(yàn)中，結(jié)果表明實(shí)際上只能檢測(cè)到50％的虛假新聞！

Fig-4: Accuracy comparison of manual fake news detection for models of different sizes圖4：不同大小型號(hào)的人工假新聞檢測(cè)的準(zhǔn)確性比較

In a task to predict the last word of a sentence, GPT-3 outperformed the current SOTA (state of the art) algorithm by 8% with an accuracy score of 76% in a zero-shot setting. In the few-shots setting, it has achieved an accuracy score of 86.4%!

在預(yù)測(cè)句子的最后一個(gè)單詞的任務(wù)中，GPT-3在零擊設(shè)置中的性能得分為76％，優(yōu)于當(dāng)前的SOTA(最新技術(shù))算法。在幾次拍攝設(shè)置中，它的準(zhǔn)確率達(dá)到86.4％！

In a closed book question answering tasks, GPT-3 outperformed a fine-tuned SOTA that uses an Information Retrieval component in both one and few-shot settings.

在一本閉卷的問(wèn)答任務(wù)中，GPT-3的性能優(yōu)于經(jīng)過(guò)精心調(diào)整的SOTA，該SOTA在一次和多次拍攝設(shè)置中都使用了信息檢索組件。

Fig-5: Performance of GPT-3 on Trivia QA for models of different sizes圖5：不同尺寸型號(hào)的Trivia QA上GPT-3的性能

The GPT-3 API has been on the waiting list, but all the folks who could get a chance to try it shared their interesting findings and amazing results of this powerful model. Here are a few things that were observed while experimenting on the API’s interface called the Playground.

GPT-3 API一直在等待中，但是所有有機(jī)會(huì)嘗試使用它的人都分享了他們有趣的發(fā)現(xiàn)以及該強(qiáng)大模型的驚人結(jié)果。這是在API的稱為Playground的接口上進(jìn)行實(shí)驗(yàn)時(shí)觀察到的一些事情。

Open AI GPT-3 API游樂(lè)場(chǎng)摘要： (Summary of the Open AI GPT-3 API Playground:)

Settings and Presets:Upon clicking on the settings icon, one can configure various parameters like the text length, temperature (from low/boring to standard to chaotic/creative), start and stop generated text etc. And there are multiple presets to choose and play around with like Chat, Q&A, Parsing Unstructured Data, Summarize for a 2nd grader

設(shè)置和預(yù)設(shè)：單擊設(shè)置圖標(biāo)后，可以配置各種參數(shù)，例如文本長(zhǎng)度，溫度(從低/無(wú)聊到標(biāo)準(zhǔn)到混亂/創(chuàng)意)，開(kāi)始和停止生成的文本等。并且有多個(gè)預(yù)設(shè)可供選擇和玩耍，例如聊天，問(wèn)答，解析非結(jié)構(gòu)化數(shù)據(jù)，為二年級(jí)學(xué)生匯總

Chat:
聊天：

The chat preset looks more like a chatbot where you can set the character of the AI as friendly, creative, clever and helpful which provides informative answers in a very polite manner whereas if you set the character of the AI to brutal it responds exactly as the character suggests!
聊天預(yù)設(shè)看起來(lái)更像是一個(gè)聊天機(jī)器人，您可以在其中將AI的角色設(shè)置為友好，富有創(chuàng)造力，聰明和樂(lè)于助人，以非常有禮貌的方式提供信息豐富的答案，而如果將AI的角色設(shè)置為殘酷，則其響應(yīng)方式與性格暗示！
Q&A:
問(wèn)答：

Question answering needs some training before it starts answering our questions and people did not have any complaints with the kind of answers received.
問(wèn)題解答在開(kāi)始回答我們的問(wèn)題之前需要接受一些培訓(xùn)，并且人們對(duì)所收到的答案沒(méi)有任何抱怨。
Parsing Unstructured Data:
解析非結(jié)構(gòu)化數(shù)據(jù)：

This is an interesting preset of the model which can comprehend and extract structured information from the unstructured text
這是模型的一個(gè)有趣的預(yù)設(shè)，它可以理解和從非結(jié)構(gòu)化文本中提取結(jié)構(gòu)化信息
Summarize for 2nd Grader:
總結(jié)二年級(jí)學(xué)生：

This preset shows another level of text compression by rephrasing the difficult sentences and concepts into simpler words and sentences that can be easily understood by a kid
該預(yù)設(shè)通過(guò)將難于理解的句子和概念改寫為較容易理解的簡(jiǎn)單單詞和句子，從而顯示了另一級(jí)文本壓縮

Multilingual text processing:GPT-3 can handle languages other than English better than GPT-2. People have tried tasks in various languages German, Russian and Japanese it did perform well and were very much ready for multilingual text processing.

多語(yǔ)言文本處理： GPT-3可以比GPT-2更好地處理英語(yǔ)以外的語(yǔ)言。人們嘗試了多種語(yǔ)言的德語(yǔ)，俄語(yǔ)和日語(yǔ)任務(wù)，性能很好，并且已經(jīng)為多語(yǔ)言文本處理做好了充分的準(zhǔn)備。

Text Generation:It can generate poems on demand that too in a particular style if required, can write stories and essays with some fine-tuning even in other languages

文本生成：它可以按需生成詩(shī)歌，如果需要，也可以使用特定樣式的詩(shī)，甚至可以用其他語(yǔ)言對(duì)故事和論文進(jìn)行微調(diào)。

Code Generation:People have claimed that this API can generate code with a minimum prompts

代碼生成：人們聲稱此API可以在最少提示的情況下生成代碼

Here is an article which showcases all its capabilities and excerpts from social media.

這 是一篇文章，展示了其所有功能和來(lái)自社交媒體的摘錄。

And this is how the AI interface looks like (Below image shows the Q&A preset):

這就是AI界面的樣子(下圖顯示了Q＆A預(yù)設(shè))：

Fig-6: Preview of the AI Playground page for a Q&A preset圖6：Q＆A預(yù)設(shè)的AI Playground頁(yè)面預(yù)覽

我們?nèi)绾问褂盟?#xff1f; (How can we use it?)

Unlike a lot of language models, GPT-3 does not need Transfer Learning, where the model is fine-tuned on task-specific data sets for specific tasks. The author of a research paper on GPT-3 mentions the following advantages of having a task-agnostic model:

與許多語(yǔ)言模型不同，GPT-3不需要轉(zhuǎn)移學(xué)習(xí)，在該模型中，可以根據(jù)特定任務(wù)的特定于任務(wù)的數(shù)據(jù)集對(duì)模型進(jìn)行微調(diào)。有關(guān)GPT-3的研究論文的作者提到了具有任務(wù)不可知模型的以下優(yōu)點(diǎn)：

Collecting task-specific data is difficult
收集特定于任務(wù)的數(shù)據(jù)很困難
Fine-tuning might yield out-of-distribution performance
微調(diào)可能會(huì)導(dǎo)致分布外性能
Need for an adaptable NLP system similar to humans, which can understand the natural language (English) and perform tasks with few or no prompts
需要類似于人類的適應(yīng)性NLP系統(tǒng)，該系統(tǒng)可以理解自然語(yǔ)言(英語(yǔ))，并且很少或沒(méi)有提示地執(zhí)行任務(wù)

The applications of GPT-3 are in-context learning, where a model is fed with a task/prompt/shot or an example and it responds to it on the basis of the skills and pattern recognition abilities that were learnt during the training to adapt the current specific task.

GPT-3的應(yīng)用是在上下文中學(xué)習(xí)，在模型中提供任務(wù)/提示/鏡頭或示例，并根據(jù)訓(xùn)練過(guò)程中學(xué)習(xí)的技能和模式識(shí)別能力對(duì)模型做出響應(yīng)當(dāng)前的特定任務(wù)。

Despite its tremendous useability, the huge model size is the biggest factor hindering the usage for most people, except those with available resources. However, there are discussions in the fraternity that distillation might come to the rescue!

盡管具有巨大的可用性，但是巨大的模型大小是阻礙大多數(shù)人(除了擁有可用資源的人)使用的最大因素。但是，在兄弟會(huì)中有討論可能會(huì)解救蒸餾！

有什么限制？ (What are the limitations?)

The Open AI founder himself said that “GPT-3 has weaknesses and it makes silly mistakes”. It is weak in the segment of sentence comparison where it has to see the usage of a word in 2 different sentences.

Open AI創(chuàng)始人本人說(shuō)：“ GPT-3有弱點(diǎn)，并且會(huì)犯愚蠢的錯(cuò)誤”。它在句子比較部分中很弱，在該部分中必須查看兩個(gè)不同句子中一個(gè)單詞的用法。

As per the researchers, it still faces some problems in the following tasks:

根據(jù)研究人員的說(shuō)法，它在以下任務(wù)中仍然面臨一些問(wèn)題：

Repetitions
重復(fù)次數(shù)
Coherence loss
相干損失
Contradictions
矛盾之處
Drawing real conclusions
得出真實(shí)結(jié)論
Multiple digit additions and subtractions
多位數(shù)加減

Fig-7: Chart showing the results of different arithmetic tasks in a few-shot setting for models of different sizes圖7：該圖表顯示了針對(duì)不同大小的模型在幾次設(shè)置中不同算術(shù)任務(wù)的結(jié)果

結(jié)論 (Conclusion)

It is great to have an NLP system that doesn’t require large amounts of custom-task specific datasets and custom-model architecture to solve specific NLP tasks. The experiments conducted show its power, potential and impact on the future of NLP advancement.

擁有不需要大量特定于定制任務(wù)的數(shù)據(jù)集和定制模型體系結(jié)構(gòu)來(lái)解決特定NLP任務(wù)的NLP系統(tǒng)，真是太好了。進(jìn)行的實(shí)驗(yàn)表明了它的力量，潛力以及對(duì)NLP未來(lái)發(fā)展的影響。

Though GPT-3 doesn’t do well on everything and the size of it makes it difficult to use by everyone, this is just the threshold of a lot of new improvements to come in the field of NLP!

盡管GPT-3不能在所有方面都做得很好，并且它的大小使每個(gè)人都難以使用，但這只是NLP領(lǐng)域中許多新改進(jìn)的門檻！

翻譯自: https://medium.com/quick-bites/gpt-3-the-latest-in-the-nlp-town-961259a0930f