用数据方法进行简单商品推荐
背景介紹
當顧客在購買一件商品時,商家可以趁機了解他們還想買什么,以便把多數顧客愿意同時購買的商品放到一起銷售以提升銷售額。當商家收集到足夠多的數據時,就可以對其進行親和性分析,以確定哪些商品適合放在一起出售。
什么是親和性呢,簡單的說就是物品之間的相似性或者說是相關性。比如說,一個去商場購物,買了蘋果的同時也買了香蕉,如果又買蘋果又買香蕉的人比較多,那么我們把蘋果和香蕉擺放在一起來銷售,往往可以提高銷量。這背后的思想就是人們經常購買同一件商品,下次大概率還是會繼續購買。看似簡單的思想,的確是很多線上和線下商品推薦服務的基礎。
之前的商品推薦工作,常常是人工在線下來完成的,費時費力,也沒有很好地精準度?,F在我們可以用數據驅動的方式來自動完成。節約成本,也提高了效率,下面我們來看看如何來做。
數據準備和介紹
import numpy as np dataset_filename = "affinity_dataset.txt" X = np.loadtxt(dataset_filename) n_samples, n_features = X.shape print("This dataset has {0} samples and {1} features".format(n_samples, n_features))結果是:This dataset has 100 samples and 5 features
我們來解釋下這個數據,看看顧客在前五次交易中都買了什么
print(X[:5]) [[ 0. 0. 1. 1. 1.][ 1. 1. 0. 1. 0.][ 1. 0. 1. 1. 0.][ 0. 0. 1. 1. 1.][ 0. 1. 0. 0. 1.]]豎著看,每一列分別表示一種商品的購買情況。分別是面包、牛奶、奶酪、蘋果和香蕉。舉個例子,第一行表示一個顧客,買了奶酪、蘋果和香蕉。而沒有買別的商品。每一行表示的是一次顧客購買行為。
數據處理
我們把數據特征打上標簽,方便后面做處理:
# The names of the features, for your reference. features = ["bread", "milk", "cheese", "apples", "bananas"]我們下面來做一個顧客既買蘋果又買香蕉的支持度和置信度,這里支持度指的是,對于總體而言,有多少樣本符合這個規則。置信度是:支持度/總體,比如說對于這個規則而言總是是買蘋果也買香蕉+買蘋果不買香蕉的總人數的和。即,只要他買蘋果,就算做是總體中的一員。
# How many of the cases that a person bought Apples involved the people purchasing Bananas too? # Record both cases where the rule is valid and is invalid. rule_valid = 0 rule_invalid = 0 for sample in X:if sample[3] == 1: # This person bought Applesif sample[4] == 1:# This person bought both Apples and Bananasrule_valid += 1else:# This person bought Apples, but not Bananasrule_invalid += 1 print("{0} cases of the rule being valid were discovered".format(rule_valid)) print("{0} cases of the rule being invalid were discovered".format(rule_invalid))輸出結果是
21 cases of the rule being valid were discovered 15 cases of the rule being invalid were discovered根據排列組合的知識,我們知道如果5種商品兩兩隨機組合的話,一共有10種組合方式(C25C52),我們計算所有組合的置信度,并把排名前三的打印出來:
import numpy as np dataset_filename = "affinity_dataset.txt" X = np.loadtxt(dataset_filename) n_samples, n_features = X.shape print("This dataset has {0} samples and {1} features".format(n_samples, n_features))# The names of the features, for your reference. features = ["bread", "milk", "cheese", "apples", "bananas"]from collections import defaultdict # Now compute for all possible rules valid_rules = defaultdict(int) invalid_rules = defaultdict(int) num_occurences = defaultdict(int) #num_occurances represents the same number of rulesfor sample in X: # (sample means record of buying fruit)for premise in range(n_features):if sample[premise] == 0: continue# Record that the premise was bought in another transactionnum_occurences[premise] += 1for conclusion in range(n_features):'''根據排列組合的規則,我這里希望按照1,2,3,4; 2,3,4; 3,4;4這樣的順序進行比較。這樣的話,比較10次,就遍歷完所有的情況?;诖?#xff0c;有了最外層的if...else語句第一句話是為了讓他按照我前面說的那個順序走,后面的判斷語句,保證不遍歷超出范圍'''conclusion = conclusion + premise if conclusion < n_features:if premise == conclusion: # It makes little sense to measure if X -> X.continueif sample[conclusion] == 1:# This person also bought the conclusion itemvalid_rules[(premise, conclusion)] += 1else:# This person bought the premise, but not the conclusioninvalid_rules[(premise, conclusion)] += 1else:continuesupport = valid_rules confidence = defaultdict(float) for premise, conclusion in valid_rules.keys():confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise]最后我們來進行排序操作,打印前三個結果。先來看一下我們處理之后的結果都是什么樣子的
# 用于打印 Python 數據結構. 當你在命令行下打印特定數據結構時你會發現它很有用(輸出格式比較整齊, 便于閱讀). from pprint import pprint pprint(list(support.items())) [((0, 1), 14),((1, 2), 7),((3, 2), 25),((1, 3), 9),((0, 2), 4),((3, 0), 5),((4, 1), 19),((3, 1), 9),((1, 4), 19),((2, 4), 27),((2, 0), 4),((2, 3), 25),((2, 1), 7),((4, 3), 21),((0, 4), 17),((4, 2), 27),((1, 0), 14),((3, 4), 21),((0, 3), 5),((4, 0), 17)]我們給輸出定義一個函數形式,方面后面進行輸出:
因為我們之前寫了一個feature列表,這樣的話就很容易鎖定到具體產品信息,只用一個列表就可以搞定,不用定義字典(這是一個不錯的思路)
示例輸出:
premise = 1 conclusion = 3 print_rule(premise, conclusion, support, confidence, features)Rule: If a person buys milk they will also buy apples
- Confidence: 0.196
- Support: 9
然后進行排序操作,我們按照置信度大小進行排序,降序:
# sort and print the first three resultfrom operator import itemgetter sorted_confidence = sorted(confidence.items(), key=itemgetter(1), reverse=True) for index in range(3):print("Rule #{0}".format(index + 1))(premise, conclusion) = sorted_confidence[index][0]print_rule(premise, conclusion, support, confidence, features)結果如下:
Rule #1
Rule: If a person buys cheese they will also buy bananas
- Confidence: 0.659
- Support: 27
Rule #2
Rule: If a person buys bread they will also buy bananas
- Confidence: 0.630
- Support: 17
Rule #3
Rule: If a person buys cheese they will also buy apples
- Confidence: 0.610
- Support: 25
從排序結果來看,“顧客買蘋果,也會買奶酪”和“顧客買奶酪,也會買香蕉”,這兩條規 則的支持度和置信度都很高。超市經理可以根據這些規則來調整商品擺放位置。例如,如果本周蘋果促銷,就在旁邊擺上奶酪。或許可以提高超市銷量哦。
參考資料:
《python數據挖掘入門與實踐》
數據集
總結
以上是生活随笔為你收集整理的用数据方法进行简单商品推荐的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 谷歌云盘Colaboratory如何载入
- 下一篇: 用OneR算法对Iris植物数据进行分类