元学习Meta-Learning
目錄
?
1. 背景
2. 元學(xué)習(xí)meta-learning
3. 應(yīng)用
3.1 事件抽取(Zero-shot Transfer Learning for Event Extraction)
1. 背景
Artificial Intelligence --> Machine Learning --> Deep Learning --> Deep Reinforcement Learning --> Deep Meta Learning
在Machine Learning時(shí)代,復(fù)雜一點(diǎn)的分類問(wèn)題效果就不好了,Deep Learning深度學(xué)習(xí)的出現(xiàn)基本上解決了一對(duì)一映射的問(wèn)題,比如說(shuō)圖像分類,一個(gè)輸入對(duì)一個(gè)輸出,因此出現(xiàn)了AlexNet這樣的里程碑式的成果。但如果輸出對(duì)下一個(gè)輸入還有影響呢?也就是sequential decision making的問(wèn)題,單一的深度學(xué)習(xí)就解決不了了,這個(gè)時(shí)候Reinforcement Learning增強(qiáng)學(xué)習(xí)就出來(lái)了,Deep Learning + Reinforcement Learning = Deep Reinforcement Learning深度增強(qiáng)學(xué)習(xí)。有了深度增強(qiáng)學(xué)習(xí),序列決策初步取得成效,因此,出現(xiàn)了AlphaGo這樣的里程碑式的成果。但是,
- 深度增強(qiáng)學(xué)習(xí)太依賴于巨量的訓(xùn)練,并且需要精確的Reward,對(duì)于現(xiàn)實(shí)世界的很多問(wèn)題,比如機(jī)器人學(xué)習(xí),沒(méi)有好的reward,也沒(méi)辦法無(wú)限量訓(xùn)練,怎么辦?
- 或者把棋盤變大一點(diǎn)AlphaGo還能行嗎?目前的方法顯然不行,AlphaGo會(huì)立馬變成傻瓜。而我們?nèi)祟惥蛥柡Χ嗔?#xff0c;分分鐘可以適應(yīng)新的棋盤。
再舉個(gè)例子人臉識(shí)別,我們?nèi)送梢灾豢匆幻婢湍苡涀〔⒆R(shí)別,而現(xiàn)在的深度學(xué)習(xí)卻需要成千上萬(wàn)的圖片才能做到。
我們?nèi)祟悡碛械目焖賹W(xué)習(xí)能力是目前人工智能所不具備的,而人類之所以能夠快速學(xué)習(xí)的關(guān)鍵是人類具備學(xué)會(huì)學(xué)習(xí)的能力,能夠充分的利用以往的知識(shí)經(jīng)驗(yàn)來(lái)指導(dǎo)新任務(wù)的學(xué)習(xí)。因此,如何讓人工智能能夠具備快速學(xué)習(xí)的能力成為現(xiàn)在的前沿研究問(wèn)題,namely?Meta Learning.
problem: deep learning依賴大量?jī)?yōu)質(zhì)的標(biāo)柱訓(xùn)練數(shù)據(jù)集 and 計(jì)算資源;移植能力差poor?portability/?p??.t??b?l.?.ti/ n.;task-specific獨(dú)立應(yīng)用于特定任務(wù),but?new concept or things come out continuously。
Reference:
[1]?https://zhuanlan.zhihu.com/p/27629294?===>作者將人的感性通過(guò)weight價(jià)值觀網(wǎng)絡(luò)體現(xiàn)出來(lái),很有想法,很有意思的一個(gè)點(diǎn)!
[2]?https://blog.csdn.net/langb2014/article/details/84953307
2. 元學(xué)習(xí)meta-learning
solution:快速學(xué)習(xí);利用以往的知識(shí)經(jīng)驗(yàn)來(lái)知道新任務(wù)學(xué)習(xí);learning to learn;inference,思考
概念:元學(xué)習(xí), meta-learning, known as learning to learn(Schmidhuber,?1987;?Bengio et al.,?1991;?Thrun and Pratt,?1998)), ?is an alternative paradigm that draws on past experience in order to learn and adapt to new tasks quickly: the model is trained on a number of related tasks such that it can solve unseen tasks using only a small number of training examples.
3. 應(yīng)用
3.1 事件抽取(Zero-shot Transfer Learning for Event Extraction)
- Problem: Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus can not be applied to new event types without annotation /ty??n.??te?.??n/ n. 標(biāo)注?effort.
- Solution: We designed a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structural and compositional neural networks, where the type of each evnet mention can be determined by the closest of all candidate types.
- Scheme: By leveraging (1) available manual annotation for a small set of existing event types and (2) existing event ontologies?/?n?t?l.?.d?i/ n. 本體, our framework applies to new event types without requiring additional annotation.
(1) goal of event extraction: event triggers; event arguments from unstructural data.
--->poor portability /?p??.t??b?l.?.ti/ n. 可移植性of traditional supervised methods and the limited coverage of available event annotations.
--->problem: handling new event types means to start from scratch without being able to re-use annotations for old event types.
? ? ? ?reasons: thest approaches modeled event extraction as a classification problem, encoding features only by measuring the similarity between rich features encoded for test event mentions and annotated event mentions.
--->We observed that both event mentions and types can be represented with structures.
? ? ? ?event mention structure <--- constructed from trigger and candidate arguments
? ? ? ?event type structure <--- consists of event type and predefined roles?
---> Figure 2.
Figure 2: Examples of Event Mention and Type Structures from ERE.
? ? ? ?AMR --> abstract meaning representation, to identify candidate arguments and construct event mention structures.
? ? ? ?ERE --> entity relation event, event types can also be represented with structures form ERE.
? ? ? ? ? ? ? ? ? ? ?besides the lexical semantics that relates a trigger to its type, their structures also tend to ben similar.
? ? ? ? ? ? ? ? ? ? ?this observation is similar to the theory that the semantics of an event structure can be generalized and mapped to event mention structures in semantic and predictable way.
? ? ? ? ? ? ? ? ? ? ?event extraction task --> by mapping each mention to its semantically closest event type in the ontology.
---> one possible implementation: Zero-Shot Learning(ZSL), which had been successfully exploited in visual object classification.
? ? ? ? main idea of ZSL for vision tasks: is to represent both images and type labels in a multi-dimensional vector space separately, and then learn a regression model to map from image semantic space to type label semantic space based on annotated images for seen labels. This regression model can be further used to predict the unseen labels of any given image.
---> one goal is to effectively transfer the knowledge of events from seen types to unseen types, so we can extract event mentions of any types defined in the ontology.
? ? ? ? We design a transferable neural architecture, which jointly learns and maps the structural representation of both event mentions and types into a shared semantic space by minimizing the distance between each event mention and its corresponding type.
? ? ? ? ?unseen types' event mentions, their structures will be projected into the same semantic space using the same framework and assigned types with top-ranked similarity values.
(2) Approach
Event Extraction: triggers; arguments
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Figure 3: Architecture Overview
? 1) a sentence S, start by identifying candidate triggers and arguments based on AMR parsing.
? ? e.g. dispatching is the trigger of a Transport_Person event with four arguments(0, China; 1, troops; 2, Himalayas; 3, time)
?? ?we build a structure St using AMR as shown in Figure 3. e.g. dispatch-01
? 2) each structure is composed of a set of tuples, e.g. <dispatch-01, :ARG0, China>
?
?
? ? we use a matrix to represent each AMR relation, composing its semantics with two concepts for each tuple, and feed all tuple representations into CNN to generate event mention structure representation Vst(namely candidate trigger).
? ? ? ? ? ? ? ? ? ? ? ? ? ?---->?pooling & concatenation. --> Vst
? ? ?Shared CNN----> Convolution Layer
? ? ? ? ? ? ? ? ? ? ? ? ? ----> Structure Composition Layer ?<--St
? 3) Given a target event ontology, for each type y, e.g. Transport_Person, we construct a type structure Sy by incorporating its predefined roles, and use a tensor to denote the implicit relation between any types and arguments.
? ? compose the semantics of type and argument role with the tensor for each tuple, e.g. <Tranport_Person, Destination>
? ? we generate the event type structure representation Vsy using the same CNN.
? 4) By minimizing the semantic distance between dispatch-01 and Transport_Person Vst and Vsy.
? ? ?we jointly map the representations of event mention and event types into a shared semantic space, where each mention is closest to its annotated type.
? 5) After training, the compositional functions and CNNs can be further used to project any new event mention(e.g. donate-01) into the semantic space and find its closest event type()
(3) Joint Event Mention and Type Label Embedding
? ? CNN is good at capture sentence level information in various NLP tasks.
? ? --> we use it to generate structure-label representations.
? ? ? ? ? ?For each event mention structure St=(u1,u2,..., un) and each event type structure Sy=(u1', u2', ...., up') which contains h and p tuples respectively.
? ? --> we apply a weight-sharing CNN to each input structure to jointly learn event mention and type structural representations, which will be later used to learn the ranking function for zero-shot event extraction.
? ? --> Input layer is a sequence of tuples, where the order of tuples is represented by a d * 2 dimensional vector, thus each mention structure and each type stucture are represented as a feature map of dimensionality d x 2h* and d x 2p* respectively.
? ? --> Convolution Layer
? ? --> Max-Pooling
? ? --> Learning
(4) Joint Event Argument and Role Embedding
(5) Zero-Shot Classification?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的元学习Meta-Learning的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 沉浸式技术immersive techn
- 下一篇: 【转载】Few-shot learnin