當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

关联规则挖掘算法_#数据挖掘初体验使用weka做关联规则

發布時間：2023/11/27 生活经验 24 豆豆

生活随笔收集整理的這篇文章主要介紹了关联规则挖掘算法_#数据挖掘初体验使用weka做关联规则小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

這學期選了數據挖掘課，前兩節課剛好都沒有去上課。照著教程練習一下課程內容...

prepare

下載軟件weka，根據系統選擇版本，個人使用版本“a disk image for OS X that contains a Mac application including Oracle's Java 1.8 JVM”Data Mining with Open Source Machine Learning Software in Java

Note : mac版本安裝時不是拖拽至application，而是雙擊weka.jar文件安裝。

下載python，terminal自帶python2和python3，個人使用python3
下載 mlxtend和jupyter，使用以下pip安裝命令在終端中安裝

pip3 install mlxtend -i https://pypi.tuna.tsinghua.edu.cn/simple #安裝mxltend 
pip3 install jupyter -i https://pypi.tuna.tsinghua.edu.cn/simple #安裝jupyter

實驗一：使用weka做關聯規則

第一步：打開explorer，open file在weka所在目錄的位置中在data找到supermarket數據

使用weka官方自帶的數據集supermarket數據集，來自真實超市的購物數據，記錄了4627條購物記錄和購物記錄對應的217個屬性。除total外，每個屬性都是布爾類型的。't'帶表True，'?'代表false。而totol字段中，‘low’代表低于100$的消費，‘high‘代表高于100$的消費。屬性中，除了商品還有商品對應的department，若購買商品中有來自某depart ment 的商品，則該depart ment 對應屬性為't ',否則為'?'。

第二步：使用算法和參數過濾

在associate下選擇算法和參數，點擊start可以開始分析。

實驗二：使用python做關聯規則

使用mlxtend對api做關聯規則：Mlxtend.frequent patterns - mlxtend

主要步驟：

讀取數據，進行預處理，將數據轉為onehot 編碼。
使用apriori挖掘頻繁項集
使用association_rules根據指定的閾值(support ，confidence，lift ，leverage，conviction)生成滿足條件的關聯規則。

任務：Supermarket.arff / Weather.nominal.arff

步驟1：按total字段中low和high的值分組分別進行關聯規則挖掘，注意分組后刪除total字段。

df_low=df[df['total']=='low'] 
df_high=df[df['total']=='high']

步驟2: 刪除所有department 屬性，使用刪除depart ment 后的數據進行關聯規則挖掘。

 #刪除department數據
departments=[x for x in df.columns if x.find('department')==0] 
df_without_department=df.drop(labels=departments,axis=1)

步驟三：使用weather.nominal.arff數據集挖掘關聯規則，若使用weka，必須使用FPgrowth算法。

#FPGrowth要求輸入01類型的nominal值矩陣。
df = pd.read_csv(path) 
df = pd.get_dummies(df)

python版本：

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rulesdef encode_units(x): if x == 't':return 1 if x == '?':return 0 else:return x #獲取在滿足最小support條件下confidence最高的top n rules 
def get_rules(df,support,confidence,n):# 獲取support>=指定閾值的頻繁項集frequent_itemsets = apriori(df, min_support=support, use_colnames=True) # 獲取confidence>=指定閾值的的關聯規則rules = association_rules(frequent_itemsets, metric="confidence",min_threshold=confidence)# 將獲取的rule按照confidence升序排序 rules.sort_values(by='confidence', ascending=False) # 獲取confidence前10的ruleif len(rules)>10:return rules[0:n]else:return rulesif __name__=="__main__": #對supermarket.csv數據集進行關聯規則挖掘 path=r'C:UserspcDesktopsupermarket.csv' df = pd.read_csv(path)# 將數據轉化成01矩陣df = df.applymap(encode_units)#刪除department數據departments=[x for x in df.columns if x.find('department')==0] df_without_department=df.drop(labels=departments,axis=1) df_without_department=pd.get_dummies(df_without_department) #按照total字段low或high刪除記錄 df_low=df[df['total']=='low'].drop(labels='total',axis=1) df_high=df[df['total']=='high'].drop(labels='total',axis=1)#當df_high sppport取0.1時，關聯規則較多，需要計算1分鐘，故取0.3 print(get_rules(df=df_high,support=0.3,confidence=0.9,n=10)) print(get_rules(df=df_low, support=0.1, confidence=0.8, n=10)) print(get_rules(df=df_without_department, support=0.1, confidence=0.9, n=10))#對Weather.nominal.csv數據集進行關聯規則挖掘path = r'C:UserspcDesktopweather.nominal.csv'df = pd.read_csv(path)df=pd.get_dummies(df)print(get_rules(df=df, support=0.1, confidence=0.9, n=10))

總結

對比python和weka，可以發現pyhton在數據預處理方面擁有很多的便利，關于pandas和python在數據分析領域的進一步使用，可以參考《利用Python進行數據分析》。

mlxtend association api介紹

DataFrame - pandas 0.24.1 documentation?pandas.pydata.org

pandas dataframe api介紹

DataFrame - pandas 0.24.1 documentation?pandas.pydata.org

python語法

Python教程?www.liaoxuefeng.com

總結

以上是生活随笔為你收集整理的关联规则挖掘算法_#数据挖掘初体验使用weka做关联规则的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python3版本代码大全_python
下一篇： python3 多线程_图解|为什么 P