mysql求回购率_用户行为分析——回购率、复购率(SQL、Python求解)
有一個多月沒有用Python了,有些生疏o(╥﹏╥)o。通過秦路老師的一道題目,分別使用sql和python求解,順便復(fù)習(xí)下python點,重點關(guān)注【復(fù)購率】、【回購率】的解法
?秦路老師視頻講解(使用SQL來求解)
已知信息:所需數(shù)據(jù):order_info.csv、user_info.csv
【表userinfo】字段為:userid,sex,birth
【表order_info.csv】字段為:orderid,userid,ispaid,price,paidtime
數(shù)據(jù)連接,提取碼:b4ca
求出以下:統(tǒng)計用戶不同月份的回購率和復(fù)購率
統(tǒng)計不同月份的下單人數(shù)
統(tǒng)計不同性別的消費頻次是否有差異
統(tǒng)計多次消費的用戶,第一次和最后一次消費時間的間隔(按天計算間隔)
統(tǒng)計不同年齡段的用戶人均消費頻次(年齡段按[0,10),[10,20),[20,30),[30,40),[40,50),[50.60),[60,70),[70,80),[80,90)來劃分,統(tǒng)計時間截止為2017年,保留兩位小數(shù))
統(tǒng)計消費的二八法則,消費金額的top20%用戶,貢獻了多少額度
從拿到數(shù)據(jù)到形成報告,無外乎以下步驟理解數(shù)據(jù)及背景
清洗數(shù)據(jù)(重復(fù)值、空值、異常值、數(shù)據(jù)一致化)
明確問題
分析問題(借助工具對數(shù)據(jù)進行分析)
解決問題
直接進入分析問題步驟:
【使用SQL進行求解】
1.統(tǒng)計用戶不同月份的回購率和復(fù)購率【復(fù)購率】當(dāng)月消費多次
【回購率】當(dāng)月消費,次月仍消費【復(fù)購率】先求出每月每個用戶的消費次數(shù),再篩選出消費多次的用戶。注意兩位數(shù)百分?jǐn)?shù)的求法concat(round(數(shù)值,2),'%')
SELECT t.`月份`,count(DISTINCT t.userid) 當(dāng)月消費人數(shù),
count(DISTINCT case when t.`消費次數(shù)`>1 then t.userid else null end) as 當(dāng)月復(fù)購人數(shù),
concat(round(count(DISTINCT case when t.`消費次數(shù)`>1 then t.userid else null end)/count(DISTINCT t.userid)*100,2),'%')as 復(fù)購率 FROM
(SELECT DATE_FORMAT(paidtime,'%Y-%m') as 月份,userid,
count(orderid) 消費次數(shù)
from 03order_info
where ispaid='已支付'
GROUP BY DATE_FORMAT(paidtime,'%Y-%m'),userid) t
GROUP BY t.`月份`
;【回購率】將order_info連接,求出diff,進而求解。類似留存率的計算
SELECT t.date1,count(DISTINCT t.userid)as 當(dāng)月購買人數(shù),
count(DISTINCT case when t.diff=1 then t.userid else null end)as 次月購買人數(shù),
count(DISTINCT case when t.diff=1 then t.userid else null end)/count(DISTINCT t.userid) as 回購率
FROM
(SELECT t1.userid,
DATE_FORMAT(t1.paidtime,'%Y-%m') as date1,
DATE_FORMAT(t2.paidtime,'%Y-%m') as date2,
TIMESTAMPDIFF(month,DATE_FORMAT(t2.paidtime,'%Y-%m-01'),DATE_FORMAT(t1.paidtime,'%Y-%m-01')) as diff
from 03order_info t1
LEFT JOIN 03order_info t2
on t1.userid=t2.userid
where t1.ispaid='已支付' and t2.ispaid='已支付')t
GROUP BY t.date1
;
2.統(tǒng)計不同月份的下單人數(shù)
SELECT DATE_FORMAT(paidtime,'%Y-%m') as 月份,
COUNT(DISTINCT userid) as 下單人數(shù) ,
count(userid) as 下單數(shù) from 03order_info
where ispaid='已支付'
GROUP BY DATE_FORMAT(paidtime,'%Y-%m');
3.不同性別的消費頻次
SELECT u.sex,
count(o.orderid)/count(DISTINCT o.userid) as 消費頻次
from 03order_info o
LEFT JOIN 03user_info u
on o.userid=u.userid
where o.ispaid='已支付'
GROUP BY u.sex;
4.統(tǒng)計多次消費的用戶,第一次和最后一次消費時間的間隔(按天計算間隔)
SELECT userid,
DATEDIFF(max(date(paidtime)),min(date(paidtime))) as 時間間隔
from 03order_info
where ispaid='已支付'
group by userid
having count(orderid)>1;
5.統(tǒng)計不同年齡段的用戶人均消費頻次
在剔除birth為空的數(shù)據(jù)后,按照birth升序排列,user_info表格順序如下。需要剔除年齡極值,規(guī)定year(birth)>1900
#年齡段劃分用【case when】函數(shù)
SELECT t.`年齡段`,COUNT(t.orderid) as 消費次數(shù),
count(DISTINCT t.userid) as 消費人數(shù),
round(COUNT(t.orderid)/count(DISTINCT t.userid),2) as 人均消費頻次 from
(SELECT o.userid,o.orderid,
case
WHEN 2017-year(u.birth)<10 then '[0,10)'
WHEN 2017-year(u.birth)<20 then '[10,20)'
WHEN 2017-year(u.birth)<30 then '[20,30)'
WHEN 2017-year(u.birth)<40 then '[30,40)'
WHEN 2017-year(u.birth)<50 then '[40,50)'
WHEN 2017-year(u.birth)<60 then '[50,60)'
WHEN 2017-year(u.birth)<70 then '[60,70)'
WHEN 2017-year(u.birth)<80 then '[70,80)'
WHEN 2017-year(u.birth)<90 then '[80,90)'
else '90+'
end as 年齡段
from 03order_info o
LEFT JOIN 03user_info u
on o.userid=u.userid
where o.ispaid='已支付' and year(u.birth)>1900) t
GROUP BY t.`年齡段`
6.統(tǒng)計消費的二八法則,消費金額的top20%用戶,貢獻了多少額度求前百分比的數(shù)值時,利用窗口函數(shù)排序,然后讓排序小于百分比與總數(shù)的乘積即可
SELECT sum(t.sum) FROM
(SELECT userid,round(sum(price),2) as 'sum',
row_number() over (order by sum(price) desc) as 'ranking'
from 03order_info
where ispaid='已支付'
GROUP BY userid) t
where t.ranking
where ispaid='已支付')
;
【使用python求解】
1.統(tǒng)計用戶不同月份的回購率和復(fù)購率
復(fù)購率:
df=order.groupby(['month','userid']).count().reset_index()
s1=df.groupby('month')['userid'].count() #當(dāng)月消費人數(shù)
s2=df[df['orderid']>1].groupby('month')['userid'].count() #當(dāng)月多次消費人數(shù)
df_new= pd.concat([s1,s2],axis=1)
df_new.columns=['當(dāng)月消費人數(shù)','當(dāng)月多次消費人數(shù)']
df_new['復(fù)購率']=(df_new['當(dāng)月多次消費人數(shù)']/df_new['當(dāng)月消費人數(shù)']).apply(lambda x: format(x,'.2%')) #兩位小數(shù)百分?jǐn)?shù)
print(df_new)
回購率:
order['month']=order['paidtime'].dt.month
order1=order.groupby(['userid','month']).count().reset_index()[['userid','month']] #統(tǒng)計出各userid再各月消費情況
order2=pd.merge(order1,order1,on='userid',how='left') #兩表連接
order2['diff']=order2['month_y']-order2['month_x']
order3=order2.groupby(['month_x','userid']).count().reset_index()['month_x'].value_counts() #當(dāng)月消費人數(shù)
order4=order2[order2['diff']==1].groupby(['month_x','userid']).count().reset_index()['month_x'].value_counts() #次月消費人數(shù)
df=pd.concat([order3,order4],axis=1)
df.columns=['當(dāng)月購買人數(shù)','次月回購人數(shù)']
df['回購率']=df['次月回購人數(shù)']/df['當(dāng)月購買人數(shù)']
print(df)
2.統(tǒng)計不同月份的下單人數(shù)【知識點】①時間函數(shù)用法(見圖,參考)②如何去重計數(shù)
order['month']=order['paidtime'].dt.strftime('%Y-%m')
order=order[order['ispaid']=='已支付']
df=order.groupby(['month','userid']).count().reset_index().groupby('month').agg({'userid':'count','orderid':'sum'})
df=df.rename(columns={'userid':'下單人數(shù)','orderid':'下單次數(shù)'})
print(df)
3.統(tǒng)計不同性別的消費頻次表連接pd.merge(table1,table2,left_on,right_on,how)
df=pd.merge(order,user,left_on='userid',right_on='userId',how='left')
df=df.groupby(['sex','userId']).count().reset_index().groupby('sex').agg({'userId':'count','orderid':'sum'})
df=df.rename(columns={'userId':'消費人數(shù)','orderid':'消費次數(shù)'})
df['消費頻次']=round(df['消費次數(shù)']/df['消費人數(shù)'],2)
print(df)
4.統(tǒng)計多次消費的用戶,第一次和最后一次消費時間的間隔(按天計算間隔)
df=order.pivot_table(index='userid',values='paidtime',aggfunc=['min','max'])
df['diff']=df['max']-df['min']
print(df.head())
5.統(tǒng)計不同年齡段的用戶人均消費頻次(年齡段按[0,10),[10,20),[20,30),[30,40),[40,50),[50.60),[60,70),[70,80),[80,90)來劃分,統(tǒng)計時間截止為2017年,保留兩位小數(shù))df['label']=pd.cut(列,bins,right,labels),數(shù)據(jù)類型為category,是無法使用groupby的,必須先使用.astype(str)將category轉(zhuǎn)換類型
#提取年份,計算年齡
#使用pd.cut(data,bins,right,label)分組
user=user[user['birth']>datetime(1900,1,1)]
user['birth']=user['birth'].dt.year
user['age']=datetime(2017,1,1).year-user['birth']
user['label']=pd.cut(user['age'],bins=[0,10,20,30,40,50,60,70,80,90,100],right=False).astype(str)#要把category轉(zhuǎn)換成str類型,才能groupby
#表連接
df=pd.merge(order,user,left_on='userid',right_on='userId',how='left').dropna() #刪除缺失值所在行
#數(shù)據(jù)分組求解
df=df[['userid','orderid','label']]
df=df.groupby(['label','userid']).count().reset_index().groupby('label').agg({'userid':'count','orderid':'sum'})
df=df.rename(columns={'userid':'消費人數(shù)','orderid':'消費次數(shù)'})
df['消費頻次']=df['消費次數(shù)']/df['消費人數(shù)']
print(df)
6.統(tǒng)計消費的二八法則,消費金額的top20%用戶,貢獻了多少額度
#計算每個用戶的消費金額
df=order.groupby('userid')['price'].sum().reset_index()
#計算貢獻排在前20%的分位數(shù)
data=df['price'].quantile(0.8,interpolation='nearest') #分位數(shù)為1314
#篩選出前20%貢獻數(shù)據(jù),并計算求和
df=df[df['price']>=data]
print(df['price'].sum())
結(jié)果:272222167.4584
(~ ̄▽ ̄)~
總結(jié)
以上是生活随笔為你收集整理的mysql求回购率_用户行为分析——回购率、复购率(SQL、Python求解)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: Deeplink(深度链接)拉起App,
- 下一篇: php中运算符的作用,在php中,双反位
