AI:《DEEP LEARNING’S DIMINISHING RETURNS—深度学习的收益递减》翻译与解读
AI:《DEEP LEARNING’S DIMINISHING RETURNS—深度學習的收益遞減》翻譯與解讀
導讀:深度學習的收益遞減。麻省理工學院的 Neil Thompson 和他的幾位合作者以一篇關于訓練深度學習系統的計算和能源成本的深思熟慮的專題文章奪得榜首。 他們分析了圖像分類器的改進,發現“要將錯誤率減半,可能需要 500 倍以上的計算資源。” 他們寫道:“面對飛漲的成本,研究人員要么不得不想出更有效的方法來解決這些問題,要么放棄對這些問題的研究,進展就會停滯不前。” 不過,他們的文章并不完全令人沮喪。 他們最后提出了一些關于前進方向的有希望的想法。
目錄
《DEEP LEARNING’S DIMINISHING RETURNS》翻譯與解讀
第一個人工神經網絡
深度學習的興起
深度學習的困惑—計算高成本
專家系統—篩選特征
量化深度學習的算力及其成本
要將錯誤率減半,您可能需要 500 倍以上的計算資源
如何解決深度學習的高成本難題
規避深度學習計算局限性的幾種探究
《DEEP LEARNING’S DIMINISHING RETURNS》翻譯與解讀
文章地址:Deep Learning’s Diminishing Returns - IEEE Spectrum
發布時間:2021年9月24日
| DEEP LEARNING’S DIMINISHING RETURNS The cost of improvement is becoming unsustainable | 深度學習的回報遞減 改進的成本變得不可持續 |
第一個人工神經網絡
| DEEP LEARNING IS NOW being used to translate between languages, predict how proteins fold, analyze medical scans, and play games as complex as Go, to name just a few applications of a technique that is now becoming pervasive. Success in those and other realms has brought this machine-learning technique from obscurity in the early 2000s to dominance today. Although deep learning's rise to fame is relatively recent, its origins are not. In 1958, back when mainframe computers filled rooms and ran on vacuum tubes, knowledge of the interconnections between neurons in the brain inspired Frank Rosenblatt at Cornell to design the first artificial neural network, which he presciently described as a "pattern-recognizing device." But Rosenblatt's ambitions outpaced the capabilities of his era—and he knew it. Even his inaugural paper was forced to acknowledge the voracious appetite of neural networks for computational power, bemoaning that "as the number of connections in the network increases...the burden on a conventional digital computer soon becomes excessive." | 深度學習現在被用于語言之間的翻譯、預測蛋白質如何折疊、分析醫學掃描以及玩像圍棋這樣復雜的游戲,僅舉幾個現在變得普遍的技術的應用。在這些領域和其他領域的成功使這種機器學習技術從 2000 年代初的默默無聞發展到今天的主導地位。 盡管深度學習的成名相對較新,但它的起源卻并非如此。 1958 年,當大型計算機充滿房間并在真空管上運行時,大腦神經元之間互連的知識啟發了康奈爾大學的弗蘭克·羅森布拉特(Frank Rosenblatt)設計了第一個人工神經網絡,他有先見之明地將其描述為“模式識別設備”。但羅森布拉特的野心超過了他那個時代的能力——他知道這一點。甚至他的就職論文也被迫承認神經網絡對計算能力的貪婪胃口,哀嘆“隨著網絡中連接數量的增加......傳統數字計算機的負擔很快就會變得過重。” |
| Fortunately for such artificial neural networks—later rechristened "deep learning" when they included extra layers of neurons—decades of Moore's Law and other improvements in computer hardware yielded a roughly 10-million-fold increase in the number of computations that a computer could do in a second. So when researchers returned to deep learning in the late 2000s, they wielded tools equal to the challenge. | 幸運的是,這種人工神經網絡——后來被重新命名為“深度學習”,當它們包含額外的神經元層時——數十年的摩爾定律和計算機硬件的其他改進,使計算機在一秒鐘內可以完成的計算量增加了大約1000萬倍。因此,當研究人員在21世紀頭十年末重返深度學習領域時,他們使用了能夠應對挑戰的工具。 |
深度學習的興起
| These more-powerful computers made it possible to construct networks with vastly more connections and neurons and hence greater ability to model complex phenomena. Researchers used that ability to break record after record as they applied deep learning to new tasks. While deep learning's rise may have been meteoric, its future may be bumpy. Like Rosenblatt before them, today's deep-learning researchers are nearing the frontier of what their tools can achieve. To understand why this will reshape machine learning, you must first understand why deep learning has been so successful and what it costs to keep it that way. | 這些更強大的計算機使構建具有更多連接和神經元的網絡成為可能,從而具有更強的建模復雜現象的能力。研究人員利用這種能力,在他們將深度學習應用于新任務時,打破了一個又一個記錄。 雖然深度學習的興起可能是飛速的,但它的未來可能是坎坷的。就像Rosenblatt之前一樣,今天的深度學習研究人員已經接近了他們的工具所能實現的目標的前沿。要了解為什么這將重塑機器學習,您必須首先了解深度學習為何如此成功以及保持這種狀態需要付出什么代價。 |
| Deep learning is a modern incarnation of the long-running trend in artificial intelligence that has been moving from streamlined systems based on expert knowledge toward flexible statistical models. Early AI systems were rule based, applying logic and expert knowledge to derive results. Later systems incorporated learning to set their adjustable parameters, but these were usually few in number. Today's neural networks also learn parameter values, but those parameters are part of such flexible computer models that—if they are big enough—they become universal function approximators, meaning they can fit any type of data. This unlimited flexibility is the reason why deep learning can be applied to so many different domains. The flexibility of neural networks comes from taking the many inputs to the model and having the network combine them in myriad ways. This means the outputs won't be the result of applying simple formulas but instead immensely complicated ones. For example, when the cutting-edge image-recognition system Noisy Student converts the pixel values of an image into probabilities for what the object in that image is, it does so using a network with 480 million parameters. The training to ascertain the values of such a large number of parameters is even more remarkable because it was done with only 1.2 million labeled images—which may understandably confuse those of us who remember from high school algebra that we are supposed to have more equations than unknowns. Breaking that rule turns out to be the key. | 深度學習是人工智能長期趨勢的現代體現,這一趨勢已從基于專家知識的精簡系統向靈活的統計模型轉變。早期的人工智能系統是基于規則的,應用邏輯和專家知識來得出結果。后來的系統結合了學習來設置它們的可調參數,但這些通常很少。 今天的神經網絡也學習參數值,但這些參數是這種靈活的計算機模型的一部分,如果它們足夠大,它們就會成為通用函數逼近器,這意味著它們可以擬合任何類型的數據。這種無限的靈活性是深度學習能夠應用于這么多不同領域的原因。 神經網絡的靈活性來自于將許多輸入輸入到模型中,并讓網絡以多種種方式將它們組合起來。這意味著輸出將不是應用簡單公式的結果,而是非常復雜公式的結果。 例如,當尖端的圖像識別系統noisestudent將圖像的像素值轉換為圖像中物體的概率時,它使用的是一個具有4.8億個參數的網絡實現的。為了確定這么多參數的值而進行的訓練更值得注意,因為它只使用了120萬張帶標簽的圖像——這可能會讓我們這些從高中代數中記得應該有更多方程而不是未知數的人感到困惑,這是可以理解的。打破這個規則被證明是關鍵。 |
| Deep-learning models are overparameterized, which is to say they have more parameters than there are data points available for training. Classically, this would lead to overfitting, where the model not only learns general trends but also the random vagaries of the data it was trained on. Deep learning avoids this trap by initializing the parameters randomly and then iteratively adjusting sets of them to better fit the data using a method called stochastic gradient descent. Surprisingly, this procedure has been proven to ensure that the learned model generalizes well. The success of flexible deep-learning models can be seen in machine translation. For decades, software has been used to translate text from one language to another. Early approaches to this problem used rules designed by grammar experts. But as more textual data became available in specific languages, statistical approaches—ones that go by such esoteric names as maximum entropy, hidden Markov models, and conditional random fields—could be applied. Initially, the approaches that worked best for each language differed based on data availability and grammatical properties. For example, rule-based approaches to translating languages such as Urdu, Arabic, and Malay outperformed statistical ones—at first. Today, all these approaches have been outpaced by deep learning, which has proven itself superior almost everywhere it's applied. | 深度學習模型是過度參數化的,也就是說,它們的參數比可供訓練的數據點還要多。通常,這將導致過度擬合,在這種情況下,模型不僅會學習一般趨勢,還會學習所訓練數據的隨機變幻莫測。深度學習通過隨機初始化參數,然后使用一種稱為隨機梯度下降的方法迭代調整參數集,以更好地擬合數據,從而避免這種陷阱。令人驚訝的是,這一過程已被證明可以確保學習到的模型具有很好的泛化性。 靈活的深度學習模型的成功可以在機器翻譯中看到。幾十年來,軟件一直被用來將文本從一種語言翻譯成另一種語言。早期解決這個問題的方法使用語法專家設計的規則。但是,隨著更多的文本數據以特定的語言出現,可以應用統計學方法——諸如最大熵、隱馬爾可夫模型和條件隨機場等深奧名稱的方法。 最初,對每種語言最有效的方法因數據可用性和語法特性而異。例如,在烏爾都語、阿拉伯語和馬來語等語言的翻譯中,基于規則的方法在一開始就優于統計方法。如今,所有這些方法都被深度學習超越了,深度學習幾乎在任何應用它的地方都證明了自己的優越性。 |
深度學習的困惑—計算高成本
| So the good news is that deep learning provides enormous flexibility. The bad news is that this flexibility comes at an enormous computational cost. This unfortunate reality has two parts. The first part is true of all statistical models: To improve performance by a factor of k, at least k2 more data points must be used to train the model. The second part of the computational cost comes explicitly from overparameterization. Once accounted for, this yields a total computational cost for improvement of at least k4. That little 4 in the exponent is very expensive: A 10-fold improvement, for example, would require at least a 10,000-fold increase in computation. To make the flexibility-computation trade-off more vivid, consider a scenario where you are trying to predict whether a patient's X-ray reveals cancer. Suppose further that the true answer can be found if you measure 100 details in the X-ray (often called variables or features). The challenge is that we don't know ahead of time which variables are important, and there could be a very large pool of candidate variables to consider. | 因此,好消息是,深度學習提供了巨大的靈活性。壞消息是,這種靈活性需要巨大的計算成本。這個不幸的現實有兩個部分。 第一部分適用于所有的統計模型:要將性能提高 k 倍,必須至少使用 k^2 個以上的數據點來訓練模型。計算成本的第二部分明顯來自過度參數化。一旦考慮到這一點,就會產生至少k^4的改進的總計算成本。指數中的小4非常昂貴:例如,10倍的改進至少需要1萬倍的計算量。 為了使靈活性和計算之間的權衡更加生動,考慮這樣一個場景:您試圖預測病人的X 射線是否顯示癌癥。進一步假設,如果在X射線中測量100個細節(通常稱為變量或特征),就能找到真正的答案。挑戰在于,我們無法提前知道哪些變量是重要的,可能會有大量的候選變量需要考慮。 |
專家系統—篩選特征
| The expert-system approach to this problem would be to have people who are knowledgeable in radiology and oncology specify the variables they think are important, allowing the system to examine only those. The flexible-system approach is to test as many of the variables as possible and let the system figure out on its own which are important, requiring more data and incurring much higher computational costs in the process. Models for which experts have established the relevant variables are able to learn quickly what values work best for those variables, doing so with limited amounts of computation—which is why they were so popular early on. But their ability to learn stalls if an expert hasn't correctly specified all the variables that should be included in the model. In contrast, flexible models like deep learning are less efficient, taking vastly more computation to match the performance of expert models. But, with enough computation (and data), flexible models can outperform ones for which experts have attempted to specify the relevant variables. | 專家系統解決這個問題的方法是讓在放射學和腫瘤學方面有知識的人指定他們認為重要的變量,允許系統只檢查這些變量。靈活系統的方法是測試盡可能多的變量,讓系統自己找出哪些是重要的,這需要更多數據并在此過程中產生更高的計算成本。 專家們建立了相關變量的模型,能夠快速了解哪些值對這些變量最有效,這樣做只需要有限的計算量——這就是為什么它們在早期如此受歡迎。但如果專家沒有正確指定模型中應該包含的所有變量,他們的學習能力就會停滯。相比之下,像深度學習這樣的靈活模型效率較低,需要大量的計算才能匹配專家模型的性能。但是,有了足夠的計算(和數據),靈活的模型可以勝過專家試圖指定相關變量的模型。 |
量化深度學習的算力及其成本
| Extrapolating the gains of recent years might suggest that by 2025 the error level in the best deep-learning systems designed for recognizing objects in the ImageNet data set should be reduced to just 5 percent [top]. But the computing resources and energy required to train such a future system would be enormous, leading to the emission of as much carbon dioxide as New York City generates in one month [bottom]. SOURCE: N.C. THOMPSON, K. GREENEWALD, K. LEE, G.F. MANSO | 根據近年來的進展推斷,到2025年,為識別圖像網絡數據集中的對象而設計的最好的深度學習系統的誤差水平應該降低到只有5%[最高]。但訓練這樣一個未來系統所需的計算資源和能源將是巨大的,導致排放的二氧化碳相當于紐約市一個月的排放量[底部]。資料來源:n.c. thompson, k. greenewald, k. lee, g.f. manso |
| Clearly, you can get improved performance from deep learning if you use more computing power to build bigger models and train them with more data. But how expensive will this computational burden become? Will costs become sufficiently high that they hinder progress? To answer these questions in a concrete way, we recently gathered data from more than 1,000 research papers on deep learning, spanning the areas of image classification, object detection, question answering, named-entity recognition, and machine translation. Here, we will only discuss image classification in detail, but the lessons apply broadly. | 顯然,如果你使用更多的計算能力來構建更大的模型,用更多的數據訓練它們,你可以從深度學習中獲得更好的性能。但是這種計算負擔會變得多昂貴呢?成本是否會高到足以阻礙進步? 為了具體地回答這些問題,我們最近收集了1000多篇關于深度學習的研究論文的數據,涵蓋了圖像分類、目標檢測、問題回答、命名實體識別和機器翻譯等領域。在這里,我們將只詳細討論圖像分類,但經驗教訓適用廣泛。 |
| Over the years, reducing image-classification errors has come with an enormous expansion in computational burden. For example, in 2012 AlexNet, the model that first showed the power of training deep-learning systems on graphics processing units (GPUs), was trained for five to six days using two GPUs. By 2018, another model, NASNet-A, had cut the error rate of AlexNet in half, but it used more than 1,000 times as much computing to achieve this. Our analysis of this phenomenon also allowed us to compare what's actually happened with theoretical expectations. Theory tells us that computing needs to scale with at least the fourth power of the improvement in performance. In practice, the actual requirements have scaled with at least the ninth power. This ninth power means that to halve the error rate, you can expect to need more than 500 times the computational resources. That's a devastatingly high price. There may be a silver lining here, however. The gap between what's happened in practice and what theory predicts might mean that there are still undiscovered algorithmic improvements that could greatly improve the efficiency of deep learning. | 多年來,減少圖像分類錯誤帶來了巨大的計算負擔。例如,在2012年,AlexNet模型首次展示了在圖形處理單元(GPU)上訓練深度學習系統的能力,使用兩個GPU進行了5到6天的訓練。到2018年,另一個模型NASNet-A已經將AlexNet的錯誤率降低了一半,但它使用了超過1000倍的計算量來實現這一目標。 我們對這一現象的分析也使我們能夠將實際發生的情況與理論預期進行比較。理論告訴我們,計算需要至少以性能改進的4次方進行擴展。在實踐中,實際需求至少有9次方。 這第9次方意味著要將錯誤率降低一半,您可能需要超過500倍的計算資源。這是一個非常高的成本。然而,這也有一線希望。實際情況和理論預測之間的差距可能意味著,仍有有待發現的算法改進,可以大大提高深度學習的效率。 |
要將錯誤率減半,您可能需要 500 倍以上的計算資源
| To halve the error rate, you can expect to need more than 500 times the computational resources. As we noted, Moore's Law and other hardware advances have provided massive increases in chip performance. Does this mean that the escalation in computing requirements doesn't matter? Unfortunately, no. Of the 1,000-fold difference in the computing used by AlexNet and NASNet-A, only a six-fold improvement came from better hardware; the rest came from using more processors or running them longer, incurring higher costs. | 要將錯誤率減半,您可能需要 500 倍以上的計算資源。 正如我們所注意到的,摩爾定律和其他硬件的進步極大地提高了芯片性能。這是否意味著計算需求的增長并不重要?抱歉,不行的。在AlexNet和NASNet-A所使用的1000倍的計算差異中,只有6倍的改進來自更好的硬件;其余部分來自使用更多的處理器或運行更長時間,從而導致更高的成本。 |
| Having estimated the computational cost-performance curve for image recognition, we can use it to estimate how much computation would be needed to reach even more impressive performance benchmarks in the future. For example, achieving a 5 percent error rate would require 10 19 billion floating-point operations. Important work by scholars at the University of Massachusetts Amherst allows us to understand the economic cost and carbon emissions implied by this computational burden. The answers are grim: Training such a model would cost US $100 billion and would produce as much carbon emissions as New York City does in a month. And if we estimate the computational burden of a 1 percent error rate, the results are considerably worse. Is extrapolating out so many orders of magnitude a reasonable thing to do? Yes and no. Certainly, it is important to understand that the predictions aren't precise, although with such eye-watering results, they don't need to be to convey the overall message of unsustainability. Extrapolating this way would be unreasonable if we assumed that researchers would follow this trajectory all the way to such an extreme outcome. We don't. Faced with skyrocketing costs, researchers will either have to come up with more efficient ways to solve these problems, or they will abandon working on these problems and progress will languish. | 估計了圖像識別的計算成本—性能曲線后,我們可以用它來估計需要多少計算才能在未來達到更令人印象深刻的性能基準。例如,實現5%的錯誤率將需要進行10190億次浮點操作。 馬薩諸塞大學阿姆赫斯特分校(University of Massachusetts Amherst)的學者們所做的重要工作,讓我們能夠理解這種計算負擔所隱含的經濟成本和碳排放。答案是嚴峻的:訓練這樣一個模型將花費1000億美元,并且將產生相當于紐約市一個月的碳排放。如果我們估計計算負擔為1%的錯誤率,結果就會相當糟糕。 推斷出這么多數量級是合理的嗎?是,不是。當然,重要的是要明白,預測并不精確,盡管有如此令人垂涎的結果,它們并不需要傳達不可持續性的整體信息。如果我們假設研究人員會沿著這條軌跡一直走到如此極端的結果,那么這樣的推斷是不合理的。我們沒有。面對飛漲的成本,研究人員要么必須想出更有效的方法來解決這些問題,要么他們將放棄對這些問題的研究,進展將會停滯。 |
| On the other hand, extrapolating our results is not only reasonable but also important, because it conveys the magnitude of the challenge ahead. The leading edge of this problem is already becoming apparent. When Google subsidiary DeepMind trained its system to play Go, it was estimated to have cost $35 million. When DeepMind's researchers designed a system to play the StarCraft II video game, they purposefully didn't try multiple ways of architecting an important component, because the training cost would have been too high. At OpenAI, an important machine-learning think tank, researchers recently designed and trained a much-lauded deep-learning language system called GPT-3 at the cost of more than $4 million. Even though they made a mistake when they implemented the system, they didn't fix it, explaining simply in a supplement to their scholarly publication that "due to the cost of training, it wasn't feasible to retrain the model." | 另一方面,推斷我們的結果不僅合理而且重要,因為它傳達了未來挑戰的重要性。這個問題的前沿已經變得很明顯。當谷歌的子公司DeepMind訓練它的系統下圍棋時,估計花費了3500萬美元。當DeepMind的研究人員設計一個玩星際爭霸2 (StarCraft II)電子游戲的系統時,他們有意沒有嘗試多種方法來構建一個重要的組件,因為訓練成本太高了。 在重要的機器學習智囊團OpenAI,研究人員最近設計并訓練了一個廣受贊譽的深度學習語言系統GPT-3,成本超過400萬美元。盡管他們在實施這個系統的時候犯了一個錯誤,但他們也沒有修復它,只是在他們的學術出版物的一份補編中解釋說,“由于訓練的成本,再訓練這個模型是不可行的。” |
| Even businesses outside the tech industry are now starting to shy away from the computational expense of deep learning. A large European supermarket chain recently abandoned a deep-learning-based system that markedly improved its ability to predict which products would be purchased. The company executives dropped that attempt because they judged that the cost of training and running the system would be too high. Faced with rising economic and environmental costs, the deep-learning community will need to find ways to increase performance without causing computing demands to go through the roof. If they don't, progress will stagnate. But don't despair yet: Plenty is being done to address this challenge. | 就連科技行業以外的企業現在也開始回避深度學習的計算成本。一家歐洲大型連鎖超市最近放棄了一種基于深度學習的系統,該系統顯著提高了其預測哪些產品將被購買的能力。該公司高管放棄了這一嘗試,因為他們認為訓練和運行系統的成本太高。 面對不斷上升的經濟和環境成本,深度學習社區將需要找到在不導致計算需求飆升的情況下提高性能的方法。如果他們不這樣做,進展就會停滯。但不要絕望:為了應對這一挑戰,我們已經做了很多工作。 |
如何解決深度學習的高成本難題
| One strategy is to use processors designed specifically to be efficient for deep-learning calculations. This approach was widely used over the last decade, as CPUs gave way to GPUs and, in some cases, field-programmable gate arrays and application-specific ICs (including Google's Tensor Processing Unit). Fundamentally, all of these approaches sacrifice the generality of the computing platform for the efficiency of increased specialization. But such specialization faces diminishing returns. So longer-term gains will require adopting wholly different hardware frameworks—perhaps hardware that is based on analog, neuromorphic, optical, or quantum systems. Thus far, however, these wholly different hardware frameworks have yet to have much impact. | 一種策略是使用專為深度學習計算而設計的高效處理器。這種方法在過去的十年中得到了廣泛的應用,因為cpu被gpu所取代,在某些情況下,現場可編程門陣列FPGA和特定應用的ic(包括谷歌的張量處理單元)。從根本上說,所有這些方法都犧牲了計算平臺的通用性,以提高專業化的效率。但是這種專業化面臨著收益遞減的問題。因此,長期的收益將需要采用完全不同的硬件框架——也許是基于模擬、神經形態、光學或量子系統的硬件。然而,到目前為止,這些完全不同的硬件框架還沒有產生太大的影響。 |
| We must either adapt how we do deep learning or face a future of much slower progress. Another approach to reducing the computational burden focuses on generating neural networks that, when implemented, are smaller. This tactic lowers the cost each time you use them, but it often increases the training cost (what we've described so far in this article). Which of these costs matters most depends on the situation. For a widely used model, running costs are the biggest component of the total sum invested. For other models—for example, those that frequently need to be retrained— training costs may dominate. In either case, the total cost must be larger than just the training on its own. So if the training costs are too high, as we've shown, then the total costs will be, too. And that's the challenge with the various tactics that have been used to make implementation smaller: They don't reduce training costs enough. For example, one allows for training a large network but penalizes complexity during training. Another involves training a large network and then "prunes" away unimportant connections. Yet another finds as efficient an architecture as possible by optimizing across many models—something called neural-architecture search. While each of these techniques can offer significant benefits for implementation, the effects on training are muted—certainly not enough to address the concerns we see in our data. And in many cases they make the training costs higher. | 我們要么調整深度學習的方式,要么面對一個進展緩慢得多的未來。 另一種減少計算負擔的方法側重于生成更小的神經網絡。這種策略降低了每次使用時的成本,但它通常會增加訓練成本(我們在本文中已經描述過)。這些費用中哪一項最重要取決于實際情況。對于一個廣泛使用的模型,運行成本是總投資的最大組成部分。對于其它模型——例如那些經常需要再訓練的模型——訓練成本可能占主導地位。無論哪種情況,總成本都必須大于訓練本身。因此,如果訓練成本太高,正如我們所展示的,那么總成本也會太高。 這就是用于縮小實施規模的各種策略所面臨的挑戰:它們沒有充分降低訓練成本。例如,允許訓練一個大型網絡,但在訓練過程中會降低復雜性。另一種方法是訓練一個大型網絡,然后“修剪”掉不重要的連接。另一種方法是通過對多個模型進行優化,從而找到一個盡可能高效的架構——這被稱為神經體系架構搜索。雖然每一種技術都可以為實施提供顯著的好處,但對訓練的影響是微弱的——當然不足以解決我們在數據中看到的問題。在很多情況下,它們使訓練成本更高。 |
規避深度學習計算局限性的幾種探究
| One up-and-coming technique that could reduce training costs goes by the name meta-learning. The idea is that the system learns on a variety of data and then can be applied in many areas. For example, rather than building separate systems to recognize dogs in images, cats in images, and cars in images, a single system could be trained on all of them and used multiple times. Unfortunately, recent work by Andrei Barbu of MIT has revealed how hard meta-learning can be. He and his coauthors showed that even small differences between the original data and where you want to use it can severely degrade performance. They demonstrated that current image-recognition systems depend heavily on things like whether the object is photographed at a particular angle or in a particular pose. So even the simple task of recognizing the same objects in different poses causes the accuracy of the system to be nearly halved. Benjamin Recht of the University of California, Berkeley, and others made this point even more starkly, showing that even with novel data sets purposely constructed to mimic the original training data, performance drops by more than 10 percent. If even small changes in data cause large performance drops, the data needed for a comprehensive meta-learning system might be enormous. So the great promise of meta-learning remains far from being realized. | 一種有望降低訓練成本的新技術被稱為元學習。其想法是,系統可以學習各種數據,然后可以應用于許多領域。例如,與其建立獨立的系統來識別圖像中的狗、圖像中的貓和圖像中的汽車,不如對所有這些圖像進行訓練并多次使用單一的系統。 不幸的是,麻省理工學院的Andrei Barbu最近的研究揭示了元學習有多難。他和他的合著者表明,原始數據和您想要使用它的地方之間即使有微小的差異,也會嚴重降低性能。他們證明,當前的圖像識別系統在很大程度上依賴于諸如物體是否以特定角度或特定姿勢拍攝。因此,即使是識別同一物體在不同姿勢下的簡單任務,也會導致系統的準確性下降近一半。 加州大學伯克利分校(University of California, Berkeley)的本杰明·雷希特(Benjamin Recht)和其他人更明確地闡述了這一點,他們表明,即使是有意構建新的數據集來模仿原始的訓練數據,性能也會下降10%以上。如果數據的微小變化導致性能大幅下降,那么一個全面的元學習系統所需要的數據可能會非常龐大。因此,元學習的巨大前景仍遠未實現。 |
| Another possible strategy to evade the computational limits of deep learning would be to move to other, perhaps as-yet-undiscovered or underappreciated types of machine learning. As we described, machine-learning systems constructed around the insight of experts can be much more computationally efficient, but their performance can't reach the same heights as deep-learning systems if those experts cannot distinguish all the contributing factors. Neuro-symbolic methods and other techniques are being developed to combine the power of expert knowledge and reasoning with the flexibility often found in neural networks. Like the situation that Rosenblatt faced at the dawn of neural networks, deep learning is today becoming constrained by the available computational tools. Faced with computational scaling that would be economically and environmentally ruinous, we must either adapt how we do deep learning or face a future of much slower progress. Clearly, adaptation is preferable. A clever breakthrough might find a way to make deep learning more efficient or computer hardware more powerful, which would allow us to continue to use these extraordinarily flexible models. If not, the pendulum will likely swing back toward relying more on experts to identify what needs to be learned. | 另一種可能規避深度學習計算局限性的策略是,轉向其他可能尚未被發現或未被重視的機器學習類型。正如我們所描述的,基于專家的見解構建的機器學習系統在計算效率上可以更高效,但如果這些專家不能區分所有的影響因素,它們的性能就無法達到深度學習系統的高度。神經符號方法和其他技術正在被開發,以將專家知識和推理的力量與神經網絡的靈活性結合起來。 就像Rosenblatt在神經網絡出現之初所面臨的情況一樣,如今的深度學習正受到現有計算工具的限制。面對將對經濟和環境造成破壞的計算尺度,我們必須要么適應深度學習的方式,要么面對一個進展緩慢得多的未來。顯然,適應是更好的選擇。一個聰明的突破可能會找到一種方法,使深度學習更有效或計算機硬件更強大,這將允許我們繼續使用這些非常靈活的模型。如果沒有,鐘擺可能會轉向更多地依靠專家來確定需要學習的內容。 |
總結
以上是生活随笔為你收集整理的AI:《DEEP LEARNING’S DIMINISHING RETURNS—深度学习的收益递减》翻译与解读的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Database之SQL:自定义创建数据
- 下一篇: Paper:《NÜWA: Visual