机器学习 客户流失_通过机器学习预测流失
機器學習 客戶流失
介紹 (Introduction)
This article is part of a project for Udacity “Become a Data Scientist Nano Degree”. The Jupyter Notebook with the code for this project can be downloaded from GitHub.
本文是Udacity“成為數(shù)據(jù)科學家納米學位”項目的一部分。 可以從GitHub下載帶有該項目代碼的Jupyter Notebook。
I will create a series of articles about this project going through CRISP-DM process. This part is covering the data and business understanding steps.
我將針對CRISP-DM流程創(chuàng)建有關該項目的一系列文章。 這一部分涵蓋了數(shù)據(jù)和業(yè)務理解步驟。
業(yè)務理解 (Business Understanding)
Let’s imagine for a moment that we are freshly hired data scientists working for a startup called “Sparkify”, which offers music streaming service through their website and App.
讓我們想象一下,我們剛招聘了一位數(shù)據(jù)科學家,為一家名為“ Sparkify”的創(chuàng)業(yè)公司工作,該公司通過其網(wǎng)站和App提供音樂流媒體服務。
Our first job is to prepare a presentation for the management meeting on business strategy. The meeting is going to be in several hours from now. We have about 10 minutes for our presentation there.
我們的第一項工作是為業(yè)務戰(zhàn)略管理會議準備演示文稿。 會議將在幾個小時后開始。 我們在那里大約有10分鐘的演講時間。
Clearly we want to impress our managers with our machine learning skills, but there is simply no time to clean all the data, not to mention run machine learning on the huge 12 GB log of the last two months of user activities.
顯然,我們希望用我們的機器學習技能來打動我們的經(jīng)理,但是根本沒有時間清理所有數(shù)據(jù),更不用說在最近兩個月的用戶活動中,在龐大的12 GB日志上運行機器學習。
We decide to take about 1% of users from the log and prepare some statistical analysis and visualisations to answer the questions we expect our managers to be most interested in, such as:
我們決定從日志中抽取大約1%的用戶,并準備一些統(tǒng)計分析和可視化圖表,以回答我們希望經(jīng)理們最感興趣的問題,例如:
1.使用方式 (1. Usage patterns)
As a streaming service of course we would like to know how many songs are played every day:
作為流媒體服務,我們當然想知道每天播放多少首歌曲:
We can see that there are only about half as much songs being played around weekends and unsurprisingly there is a large spike around Halloween. To get a better feeling of the usage frequency let’s look at the and average number of unique users per weekday:
我們可以看到,周末前后只播放大約一半的歌曲,毫不奇怪,萬圣節(jié)前后會有很大的高峰。 為了更好地了解使用頻率,讓我們看一下每個工作日的唯一身份用戶數(shù)和平均數(shù)量:
Another interesting question is the distribution of user activity throughout the day. Let’s have a look at the average number of songs played by the hour:
另一個有趣的問題是一天中用戶活動的分布。 讓我們看一下每小時播放的平均歌曲數(shù):
And the user activity:
和用戶活動:
使用情況摘要 (Summary usage statistics)
Let’s formulate the key insights from our analysis:
讓我們從分析中得出關鍵見解:
- We have seen that usage statistics follow a weekly pattern with less users using Sparkify on weekends. 我們已經(jīng)看到,使用情況統(tǒng)計信息遵循每周模式,周末使用Sparkify的用戶減少了。
- Unsurprisingly there is a spike in streams around Halloween. 毫無疑問,萬圣節(jié)前后的溪流激增。
- Throughout the day the number of users remains almost constant with a slight increase between 1 and 7 p.m. 整天的用戶數(shù)量幾乎保持不變,下午1點至晚上7點之間略有增加
- The number of songs played per user throughout the day has a pattern where it follows daily activities: get up, way to work, start of work, lunch break etc. 全天每位用戶播放的歌曲數(shù)量遵循以下日?;顒幽J?#xff1a;起床,工作方式,工作開始,午餐休息時間等。
More important is to know what we can do with this insights:
更重要的是要知道我們可以用這些見解做什么:
- We can optimise licence costs knowing how many songs will be played. 我們可以知道要播放多少首歌曲,從而優(yōu)化許可費用。
- We can optimise the number of servers running throughout the day and week to save electricity and networking costs based on user activity. 我們可以優(yōu)化每天和每周運行的服務器數(shù)量,以根據(jù)用戶活動節(jié)省電費和網(wǎng)絡成本。
- We can target our user communication to the time frames where they are most likely to use our service. 我們可以將我們的用戶交流定位到最有可能使用我們服務的時間范圍。
2.業(yè)務發(fā)展 (2. Business development)
The main revenue source for Sparkify are periodical subscription fees from paying users. We would like to know how many users have actually used “paid” and how many used “free” options:
Sparkify的主要收入來源是來自付費用戶的定期訂閱費用。 我們想知道實際上有多少用戶使用了“付費”選項,有多少用戶使用了“免費”選項:
Another source of revenue is playing advertising clips for free users. How many clips are played every week?
另一個收入來源是為免費用戶播放廣告片段。 每周播放幾段剪輯?
Let’s also see how many ads on average are displayed to each user:
我們還要查看平均向每個用戶展示多少個廣告:
摘要業(yè)務發(fā)展 (Summary business development)
Let’s formulate the key insights and takeaways for our business.
讓我們?yōu)槲覀兊臉I(yè)務制定關鍵的見解和要點。
Key insights
重要見解
- The number of paying customers is increasing in the observation period. 在觀察期內,付費客戶的數(shù)量正在增加。
- The number of adverts decreases. 廣告數(shù)量減少。
- The number of free customers is decreasing. 免費客戶的數(shù)量正在減少。
Takeaways for business
外賣業(yè)務
- The number of paying customers is not changing much after the first week. Probably we need to motivate people to switch to paid account by limited time offer or free trial. 第一周后,付費客戶的數(shù)量變化不大。 可能我們需要激勵人們通過限時優(yōu)惠或免費試用來切換到付費帳戶。
- The number of free customers is decreasing at quite high rate. It seems that the free account is not very attractive. We have to look at the reasons more closely. Are the adverts to frequent? Do free users have limited access to the music titles? 免費客戶的數(shù)量正在以很高的速度減少。 看來免費帳戶不是很吸引人。 我們必須更仔細地研究原因。 廣告頻繁嗎? 免費用戶對音樂標題的訪問受限嗎?
- Although the number of adverts is falling the number of adverts per user is increasing. Perhaps we have taken the wrong road here given that free users are probably choosing to leave the service over upgrading their account? 盡管廣告數(shù)量在減少,但每位用戶的廣告數(shù)量卻在增加。 鑒于免費用戶可能選擇離開服務而不是升級其帳戶,也許我們走錯了路?
3.對企業(yè)的威脅 (3. Threats to the business)
Finally let’s look at the account level upgrades, downgrades and cancellations:
最后,讓我們看一下帳戶級別的升級,降級和取消:
To have a more clear picture let’s see which account level do users who cancel their account have:
為了更清楚地了解情況,讓我們看看取消帳戶的用戶具有哪個帳戶級別:
摘要業(yè)務威脅 (Summary business threats)
Let’s formulate the key insights and takeaways for our business.
讓我們?yōu)槲覀兊臉I(yè)務制定關鍵的見解和要點。
Key insights
重要見解
- The number of upgrades spiked in the first week of observation. 在觀察的第一周內,升級數(shù)量激增。
- The number of upgrades is declining during the period of observation. 在觀察期間,升級次數(shù)正在減少。
- The number of downgrades has a small spike in the week 41 and is almost steady with decline near the end. 降級的數(shù)量在第41周有一個小峰值,并且?guī)缀跏欠€(wěn)定的,并且在接近尾聲時有所下降。
- The number of cancellations is almost steady with a small spike around week 42 and decline near the end. 取消的數(shù)量幾乎是穩(wěn)定的,在第42周左右有一個小峰值,并在接近尾聲時下降。
- Paying users are cancelling their accounts more often then free users. 付費用戶比免費用戶更頻繁地取消帳戶。
Takeaways for business
外賣業(yè)務
- Whatever we have done in the week 40 we must keep doing that! 不管我們在40周內做了什么,我們都必須繼續(xù)這樣做!
- We need to understand why less and less customers choose to upgrade their accounts. 我們需要了解為什么越來越少的客戶選擇升級他們的帳戶。
- Although the downgrade and cancellation rates are falling we need pay more attention to them. 盡管降級和取消率正在下降,但我們需要更加注意它們。
- The fact that paying users are choosing to cancel their account rather than to downgrade them is alarming. What have we done wrong to make them angry? 付費用戶選擇取消其帳戶而不是降級他們的事實令人震驚。 我們做錯了什么使他們生氣?
結論:我們可以確定流失的原因嗎? (Conclusion: can we identify reasons for churn?)
The presentation went well. Most of the people in the room were not of technical background. They were impressed by comprehensive visualisations and clearly formulated statements about the current situation.
演講進行得很順利。 房間里的大多數(shù)人都不是技術背景。 全面的可視化效果和清晰表達的有關當前狀況的陳述給他們留下了深刻的印象。
The consequence is that the management is now worried about churn. They ask us to find the reasons why the customers, especially paying ones are cancelling their accounts.
結果是管理層現(xiàn)在擔心流失。 他們要求我們找出客戶(尤其是付費客戶)取消帳戶的原因。
We will have to run machine learning on our data and it will take some days to find the right techniques on the small subset of data and then maybe some weeks to run the algorithms on the full dataset.
我們將不得不對數(shù)據(jù)進行機器學習,這將需要幾天的時間才能在較小的數(shù)據(jù)子集上找到正確的技術,然后可能需要數(shù)周的時間才能在完整的數(shù)據(jù)集上運行算法。
Using our intuition we can try to find a quick fix, which may help our company on a short notice. Let’s look at the statistics of rolling adverts:
利用我們的直覺,我們可以嘗試找到快速解決方案,這可能會在短時間內為我們的公司提供幫助。 讓我們看一下滾動廣告的統(tǒng)計信息:
It turns out paying customers still may see or hear an advert. Can it be the reason why they choose to quit? Perhaps our web developers should look into that issue.
事實證明,付費客戶仍然可以看到或聽到廣告。 這可能是他們選擇退出的原因嗎? 也許我們的Web開發(fā)人員應該調查該問題。
In my next article I will focus on machine learning techniques and how can they be applied to predict churn based on usage statistics.
在我的下一篇文章中,我將重點介紹機器學習技術以及如何將其應用于基于使用情況統(tǒng)計信息的客戶流失率。
翻譯自: https://medium.com/@viovioviovioviovio/predict-churn-with-machine-learning-ea00b8a42011
機器學習 客戶流失
總結
以上是生活随笔為你收集整理的机器学习 客户流失_通过机器学习预测流失的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到爬墙墙倒了什么意思
- 下一篇: 梦到死人的纸钱是什么意思