对话数据集
A Natural Language Corpus of Common Grounding underContinuous and Partially-Observable Context
任務(wù):根據(jù)對話描述,找出兩個(gè)speaker可以共同看見的實(shí)體
構(gòu)建6760對話數(shù)據(jù)集
https://arxiv.org/abs/1907.03399
RadioTalk: a large-scale corpus of talk radio transcripts
無線電轉(zhuǎn)錄的大量對話數(shù)據(jù)集
284000無線電自動(dòng)轉(zhuǎn)錄語音
https://arxiv.org/abs/1907.07073
- Large Scale Question Answering using Tourism Data
標(biāo)題:基于旅游數(shù)據(jù)的大規(guī)模問答
作者: Danish Contractor, Parag Singla
鏈接:https://arxiv.org/abs/1909.03527
我們收集了一份QA數(shù)據(jù)集,其中包含48,147個(gè)段落大小的真實(shí)用戶問題,這些問題來自尋求酒店,景點(diǎn)和餐館推薦的旅行者。每個(gè)候選答案都與一組非結(jié)構(gòu)化評論相關(guān)聯(lián)。
- Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset
標(biāo)題:Taskmaster-1:走向現(xiàn)實(shí)和多樣化的對話數(shù)據(jù)集
作者: Bill Byrne, Kyu-Young Kim
備注:To appear at EMNLP 2019
鏈接:https://arxiv.org/abs/1909.05358
其中包括13,215個(gè)基于任務(wù)的對話框,包含六個(gè)域。
-
Generating Challenge Datasets for Task-Oriented Conversational Agents through Self-Play
標(biāo)題:通過自玩為面向任務(wù)的會(huì)話代理生成挑戰(zhàn)數(shù)據(jù)集
作者: Sourabh Majumdar, Serra Sinem Tekiroglu
備注:Proceedings of Recent Advances in Natural Language Processing (RANLP) Conference, 2019
鏈接:https://arxiv.org/abs/1910.07357
-
The Eighth Dialog System Technology Challenge
標(biāo)題:第八屆對話系統(tǒng)技術(shù)挑戰(zhàn)
作者: Seokhwan Kim, Raghav Gupta
備注:Submitted to NeurIPS 2019 3rd Conversational AI Workshop
鏈接:https://arxiv.org/abs/1911.06394
-
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset forE-commerce Customer Service
標(biāo)題:JDDC語料庫:一個(gè)大規(guī)模的多回合中文對話數(shù)據(jù)集電子商務(wù)客戶服務(wù)
作者: Meng Chen, Bowen Zhou
鏈接:https://arxiv.org/abs/1911.09969
-
Filling Conversation Ellipsis for Better Social Dialog Understanding
標(biāo)題:填充會(huì)話省略以更好地理解社會(huì)對話
作者: Xiyuan Zhang, Zhou Yu
備注:Accepted to AAAI 2020
鏈接:https://arxiv.org/abs/1911.10776
針對于省略號(hào)問題,我們還提供了一個(gè)開放域的人機(jī)對話數(shù)據(jù)集,其中包含手動(dòng)完成的用戶話語和手動(dòng)完成后的帶注釋的語義角色標(biāo)簽。
-
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
標(biāo)題:SAMSum語料庫:用于摘要的人類標(biāo)注對話數(shù)據(jù)集
作者: Bogdan Gliwa, Aleksander Wawer
鏈接:https://arxiv.org/abs/1911.12237
-
Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset
標(biāo)題:介紹螳螂:一種新的多領(lǐng)域信息搜索對話數(shù)據(jù)集
作者: Gustavo Penha, Claudia Hauff
鏈接:https://arxiv.org/abs/1912.04639
-
Characterizing the dynamics of learning in repeated reference games
標(biāo)題:在重復(fù)參照游戲中刻畫學(xué)習(xí)的動(dòng)力
作者: Robert D. Hawkins, Noah D. Goodman
鏈接:https://arxiv.org/abs/1912.07199
-
I love your chain mail! Making knights smile in a fantasy game world: ?Open-domain goal-orientated dialogue agents
標(biāo)題:我喜歡你的鎖甲!讓騎士在幻想游戲世界中微笑:開放領(lǐng)域目標(biāo)導(dǎo)向的對話代理
作者:Shrimai Prabhumoye, ?Arthur Szlam
鏈接:https://arxiv.org/abs/2002.02878
-
WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection
標(biāo)題:WAC:用于在線濫用檢測的維基百科對話語料庫
作者:Noé Cecillon (LIA), ?Georges Linares (LIA)
鏈接:https://arxiv.org/abs/2003.06190
-
MedDialog: A Large-scale Medical Dialogue Dataset
標(biāo)題:MedDialog:一個(gè)大規(guī)模醫(yī)學(xué)對話數(shù)據(jù)集
作者: Shu Chen, ?Pengtao Xie
鏈接:https://arxiv.org/abs/2004.03329
-
KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation
標(biāo)題:KdConv:一個(gè)面向多輪知識(shí)驅(qū)動(dòng)會(huì)話的中文多域?qū)υ挃?shù)據(jù)集
作者: Hao Zhou, Xiaoyan Zhu
鏈接:https://arxiv.org/abs/2004.04100
-
Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure
標(biāo)題:Molweni:一個(gè)具有語篇結(jié)構(gòu)的基于多方對話的機(jī)器閱讀理解數(shù)據(jù)集
作者: Jiaqi Li, Bing Qin
鏈接:https://arxiv.org/abs/2004.05080
-
A New Dataset for Natural Language Inference from Code-mixed Conversations
標(biāo)題:一種新的基于代碼混合會(huì)話的自然語言推理數(shù)據(jù)集
作者: Simran Khanuja, Monojit Choudhury
備注:To appear in CALCS, LREC 2020
鏈接:https://arxiv.org/abs/2004.05051
-
Dialogue-Based Relation Extraction
標(biāo)題:基于對話的關(guān)系抽取
作者:Dian Yu, ?Dong Yu
備注:To appear in ACL 2020
鏈接:https://arxiv.org/abs/2004.08056
-
Grounding Conversations with Improvised Dialogues
標(biāo)題:以即興對話為基礎(chǔ)的對話
作者: Hyundong Cho, Jonathan May
備注:ACL2020; 9 pages + 1 page appendix
鏈接:https://arxiv.org/abs/2004.09544
-
?
總結(jié)
- 上一篇: 【数据库】数据库入门(二): 关系型数据
- 下一篇: python编程应用中级_如何利用Pyt