计算机科学必读书籍_5篇关于数据科学家的产品分类必读文章
計算機科學必讀書籍
Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.
產(chǎn)品分類/產(chǎn)品分類是將產(chǎn)品組織到各自部門或類別中。 同樣,整個過程的很大一部分是整個產(chǎn)品分類的設計。
Product categorization was initially a text classification task that analyzed the product’s title to choose the appropriate category. However, numerous methods have been developed which take into account the product title, description, images, and other available metadata. The following papers on product categorization represent essential reading in the field and offer novel approaches to product classification tasks.
產(chǎn)品分類最初是一個文本分類任務,用于分析產(chǎn)品標題以選擇適當?shù)念悇e。 但是,已經(jīng)開發(fā)出許多方法來考慮產(chǎn)品標題,描述,圖像和其他可用的元數(shù)據(jù)。 以下有關產(chǎn)品分類的論文代表了該領域的重要閱讀內容,并為產(chǎn)品分類任務提供了新穎的方法。
1.不要分類,翻譯 (1. Don’t Classify, Translate)
In this paper, researchers from the National University of Singapore and the Rakuten Institute of Technology propose and explain a novel machine translation approach to product categorization. The experiment uses the Rakuten Data Challenge and Rakuten Ichiba datasets. Their method translates or converts a product’s description into a sequence of tokens which represent a root-to-leaf path to the correct category. Using this method, they are also able to propose meaningful new paths in the taxonomy.
在本文中,新加坡國立大學和樂天技術學院的研究人員提出并解釋了一種新穎的機器翻譯方法來進行產(chǎn)品分類。 該實驗使用了Rakuten Data Challenge和Rakuten Ichiba數(shù)據(jù)集。 他們的方法將產(chǎn)品的描述轉換或轉換為一系列標記,這些標記代表從根到葉的正確類別路徑。 使用這種方法,他們還能夠在分類法中提出有意義的新路徑。
The researchers state that their method outperforms many of the existing classification algorithms commonly used in machine learning today.
研究人員指出,他們的方法優(yōu)于當今機器學習中常用的許多現(xiàn)有分類算法。
Published/Last Updated — Dec. 14, 2018
發(fā)布/最新更新— 2018年12月14日
Authors and Contributors — Maggie Yundi Li (National University of Singapore), Stanley Kok (National University of Singapore), and Liling Tan (Rakuten Institute of Technology)
作者和撰稿人:李Mag(新加坡國立大學),斯坦利·科克(新加坡國立大學)和譚麗玲(樂天技術學院)
Read Now
現(xiàn)在讀
2.使用神經(jīng)注意模型對日本商品名稱進行大規(guī)模分類 (2. Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models)
The authors of this paper propose attention convolutional neural network (ACNN) models over baseline convolutional neural network (CNN) models and gradient boosted tree (GBT) classifiers. The study uses Japanese product titles taken from Rakuten Ichiba as training data. Using this data, the authors compare the performance of the three methods (ACNN, CNN, and GBT) for large-scale product categorization. While differences in accuracy can be less than 5%, even minor improvements in accuracy can result in millions of additional correct categorizations.
本文的作者提出了關注卷積神經(jīng)網(wǎng)絡(ACNN)模型,而不是基線卷積神經(jīng)網(wǎng)絡(CNN)模型和梯度提升樹(GBT)分類器。 該研究使用從Rakuten Ichiba獲得的日語產(chǎn)品標題作為培訓數(shù)據(jù)。 利用這些數(shù)據(jù),作者比較了三種方法(ACNN,CNN和GBT)用于大規(guī)模產(chǎn)品分類的性能。 盡管精度差異可以小于5%,但即使精度略有提高,也可以導致數(shù)百萬種其他正確的分類。
Lastly, the authors explain how an ensemble of ACNN and GBT models can further minimize false categorizations.
最后,作者解釋了ACNN和GBT模型的集成如何進一步減少錯誤分類。
Published/Last Updated — April, 2017 for EACL 2017
已發(fā)布/最新更新— 2017年4月,適用于EACL 2017
Authors and Contributors — From the Rakuten Institute of Technology: Yandi Xia, Aaron Levine, Pradipto Das Giuseppe Di Fabbrizio, Keiji Shinzato and Ankur Datta
作者和撰稿人—來自樂天技術學院:夏彥迪,亞倫·萊文,Pradipto Das Giuseppe Di Fabbrizio,京急新zato和安庫·達塔
Read Now
現(xiàn)在讀
3.地圖集:電子商務服裝產(chǎn)品分類的數(shù)據(jù)集和基準 (3. Atlas: A Dataset and Benchmark for Ecommerce Clothing Product Classification)
Researchers at the University of Colorado and Ericsson Research (Chennai, India) have created a large product dataset known as Atlas. In this paper, the team presents their dataset which includes over 186,000 images of clothing products along with their product titles. Furthermore, they introduce related work in the field that has influenced their study. Finally, they test their dataset using a Resnet34 classification model and a Seq to Seq model to categorize the products. The data is taken from Indian ecommerce stores, so some of the categories used may not be applicable to Western markets. However, the dataset has been open-sourced and is available on Github.
科羅拉多大學和愛立信研究公司(印度金奈)的研究人員創(chuàng)建了一個名為Atlas的大型產(chǎn)品數(shù)據(jù)集。 在本文中,研究小組展示了他們的數(shù)據(jù)集,其中包括超過186,000種服裝產(chǎn)品的圖像以及產(chǎn)品標題。 此外,他們介紹了影響他們的研究領域的相關工作。 最后,他們使用Resnet34分類模型和Seq to Seq模型對產(chǎn)品進行測試,以對產(chǎn)品進行分類。 數(shù)據(jù)來自印度的電子商務商店,因此使用的某些類別可能不適用于西方市場。 但是,該數(shù)據(jù)集已經(jīng)開源,可以在Github上使用。
Published/Last Updated — Aug. 19, 2019
發(fā)布/最后更新— 2019年8月19日
Authors and Contributors — Venkatesh Umaashankar (Ericsson Research), Girish Shanmugam (Ericsson Research), and Aditi Prakash (University of Colorado)
作者和撰稿人— Venkatesh Umaashankar(愛立信研究中心),Girish Shanmugam(愛立信研究中心)和Aditi Prakash(科羅拉多大學)
Read Now
現(xiàn)在讀
4.使用結構化和非結構化屬性的大規(guī)模產(chǎn)品分類 (4. Large Scale Product Categorization using Structured and Unstructured Attributes)
In this study, a team at WalmartLabs compares hierarchical models to flat models for product categorization.
在這項研究中,沃爾瑪實驗室的一個團隊將層次模型與平面模型進行了比較,以進行產(chǎn)品分類。
The researchers employ deep-learning based models which extract features from each product to create a product signature. In the paper, the researchers describe a multi-LSTM and multi-CNN based approach to this extreme classification task. Furthermore, they present a novel way to use structured attributes. The team states that their methods can be scaled to take into account any number of product attributes during categorization.
研究人員采用了基于深度學習的模型,該模型從每個產(chǎn)品中提取功能以創(chuàng)建產(chǎn)品簽名。 在論文中,研究人員描述了一種基于多LSTM和多CNN的方法來完成這種極端分類任務。 此外,它們提供了一種使用結構化屬性的新穎方法。 該團隊指出,他們的方法可以擴展,以在分類過程中考慮任何數(shù)量的產(chǎn)品屬性。
Published/Last Updated — Mar. 1, 2019
已發(fā)布/最新更新— 2019年3月1日
Authors and Contributors — From WalmartLabs: Abhinandan Krishnan and Abilash Amarthaluri
作者和貢獻者—來自沃爾瑪實驗室:Abhinandan Krishnan和Abilash Amarthaluri
Read Now
現(xiàn)在讀
5.使用多模式融合模型進行多標簽產(chǎn)品分類 (5. Multi-Label Product Categorization Using Multi-Modal Fusion Models)
In this paper, researchers from New York University and U.S. Bank investigate multi-modal approaches to categorize products on Amazon. Their approach utilizes multiple classifiers trained on each type of input data from the product listings. Using a dataset of 9.4 million Amazon products, they developed a tri-modal model for product classification based on product images, titles, and descriptions. Their tri-modal late fusion model retains an F1 score of 88.2%.
在本文中,來自紐約大學和美國銀行的研究人員研究了多模式方法來對亞馬遜上的產(chǎn)品進行分類。 他們的方法利用了針對產(chǎn)品列表中每種輸入數(shù)據(jù)類型進行訓練的多個分類器。 他們使用940萬個Amazon產(chǎn)品的數(shù)據(jù)集,開發(fā)了一種基于產(chǎn)品圖像,標題和描述的產(chǎn)品分類的三峰模型。 他們的三峰后期融合模型保留了88.2%的F1分數(shù)。
The findings of their study demonstrate that increasing the number of modalities could improve performance in multi-label product categorization.
他們研究的結果表明,增加模式數(shù)量可以改善多標簽產(chǎn)品分類的性能。
Published/Last Updated — June 30, 2019
發(fā)布/最新更新— 2019年6月30日
Authors and Contributors — Pasawee Wirojwatanakul (New York University) and Artit Wangperawong (U.S. Bank)
作者和貢獻者— Pasawee Wirojwatanakul(紐約大學)和Artit Wangperawong(美國銀行)
Read Now
現(xiàn)在讀
In the papers on product categorization above, the researchers trained their models on open datasets which included millions of products. However, if you are building a product categorization model for commercial use, many open datasets may not be available to you.
在上面有關產(chǎn)品分類的論文中,研究人員在包含數(shù)百萬種產(chǎn)品的開放數(shù)據(jù)集上訓練了他們的模型。 但是,如果您要構建用于商業(yè)用途的產(chǎn)品分類模型,則可能無法使用許多開放數(shù)據(jù)集。
Looking for training data for your product classification model? Check out this training data guide and these open datasets.
尋找針對您的產(chǎn)品分類模型的培訓數(shù)據(jù)? 查閱本培訓數(shù)據(jù)指南和這些開放的數(shù)據(jù)集 。
翻譯自: https://medium.com/analytics-vidhya/5-must-read-papers-on-product-categorization-for-data-scientists-19c98421cef3
計算機科學必讀書籍
總結
以上是生活随笔為你收集整理的计算机科学必读书籍_5篇关于数据科学家的产品分类必读文章的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: 做梦梦到家里地震预示着什么
- 下一篇: 梦到捡鹅卵石是什么意思
