接facebook广告_Facebook广告分析
接facebook廣告
Is our company’s Facebook advertising even worth the effort?
我們公司的Facebook廣告是否值得努力?
題: (QUESTION:)
A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.
公司想知道他們的廣告是否有效。 開始之前,是的。 Facebook確實(shí)為實(shí)際使用其廣告平臺(tái)的用戶提供了分析。 我們的客戶沒有。 他們的“廣告”是他們的供稿上的帖子,并不由Facebook進(jìn)行營銷。
數(shù)據(jù): (DATA:)
Data is from the client’s POS system and their Facebook feed.
數(shù)據(jù)來自客戶的POS系統(tǒng)及其Facebook提要。
模型: (MODEL:)
KISS. A simple linear model will suffice.
吻。 一個(gè)簡單的線性模型就足夠了。
First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.
首先,我們需要獲取數(shù)據(jù)。 我們可以使用一個(gè)不錯(cuò)的Facebook抓取工具以可用的格式抓取最后的帖子。
#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as pdLets first scrape the posts from the first 200 posts of their Facebook page.
首先讓我們從其Facebook頁面的前200個(gè)帖子中抓取這些帖子。
#scrape post_list = [] for post in get_posts('clients_facebook_page', pages=200): post_list.append(post)#View the data print(post_list[0].keys()) print("Number of Posts: {}".format(len(post_list)))## dict_keys(['post_id', 'text', 'post_text', 'shared_text', 'time', 'image', 'video', 'video_thumbnail', 'likes', 'comments', 'shares', 'post_url', 'link']) ## Number of Posts: 38Lets clean up the list, keeping only Time, Image, Likes, Comments, Shares.
讓我們清理列表,僅保留時(shí)間,圖像,頂,評(píng)論,共享。
post_list_cleaned = [] for post in post_list: #create a list of indexes to keep temp = [] indexes_to_keep = ['time', 'image', 'likes', 'comments', 'shares'] for key in indexes_to_keep: temp.append(post[key]) post_list_cleaned.append(temp) #Remove image hyperlink, replace with 0, 1 & recast date for post in post_list_cleaned: if post[1] == None: post[1] = 0 else: post[1] = 1 post[0] = post[0].dateWe now need to combine the Facebook data with data from the company’s POS system.
現(xiàn)在,我們需要將Facebook數(shù)據(jù)與該公司POS系統(tǒng)中的數(shù)據(jù)結(jié)合起來。
#turn into a DataFrame fb_posts_df = pd.DataFrame(post_list_cleaned) fb_posts_df.columns = ['Date', 'image', 'likes', 'comments', 'shares'] #import our POS data daily_sales_df = pd.read_csv('daily_sales.csv') #merge both sets of data combined_df = pd.merge(daily_sales_df, fb_posts_df, on='Date', how='outer')Finally, lets export the data to a csv. We’ll do our modeling in R.
最后,讓我們將數(shù)據(jù)導(dǎo)出到csv。 我們將在R中進(jìn)行建模。
combined_df.to_csv('data.csv')R分析 (R Analysis)
First, lets import our data from python. We then need to ensure the variables are cast appropriate (ie, dates are actual datetime fields and not just strings’). Finally, we are only conserned with data since the start of 2019.
首先,讓我們從python導(dǎo)入數(shù)據(jù)。 然后,我們需要確保變量被正確地轉(zhuǎn)換(即,日期是實(shí)際的日期時(shí)間字段,而不僅僅是字符串')。 最后,自2019年初以來,我們只了解數(shù)據(jù)。
library(readr) library(ggplot2) library(gvlma) #set a seed to be reproducable data <- read.table("data.csv", header = TRUE, sep = ",") data <- as.data.frame(data) #rename 'i..Date' to 'Date' names(data)[1] <- c("Date") #set data types data$Sales <- as.numeric(data$Sales) data$Date <- as.Date(data$Date, "%m/%d/%Y") data$Image <- as.factor(data$Image) data$Post <- as.factor(data$Post) #create a set of only 2019+ data data_PY = data[data$Date >= '2019-01-01',] head(data)6 rows
6排
head(data_PY)6 rows
6排
summary(data_PY)## Date Sales Post Image Likes ## Min. :2019-01-02 Min. :3181 0:281 0:287 Min. : 0.000 ## 1st Qu.:2019-04-12 1st Qu.:3370 1: 64 1: 58 1st Qu.: 0.000 ## Median :2019-07-24 Median :3456 Median : 0.000 ## Mean :2019-07-24 Mean :3495 Mean : 3.983 ## 3rd Qu.:2019-11-02 3rd Qu.:3606 3rd Qu.: 0.000 ## Max. :2020-02-15 Max. :4432 Max. :115.000 ## Comments Shares ## Min. : 0.0000 Min. :0 ## 1st Qu.: 0.0000 1st Qu.:0 ## Median : 0.0000 Median :0 ## Mean : 0.3101 Mean :0 ## 3rd Qu.: 0.0000 3rd Qu.:0 ## Max. :19.0000 Max. :0Now that our data’s in, let’s review our summary. We can see our data starts on Jan. 2, 2019 (as we hoped), but we do see one slight problem. When we look at the Post variable, we see it’s highly imbalanced. We have 281 days with no posts and only 64 days with posts. We should re-balance our dataset before doing more analysis to ensure our results aren’t skewed. I’ll rebalance our data by sampling from the larger group (days with no posts), known as undersampling. I’ll also set a random seed so that our numbers are reproducible.
現(xiàn)在已經(jīng)有了我們的數(shù)據(jù),讓我們回顧一下摘要。 我們可以看到我們的數(shù)據(jù)從2019年1月2日開始(我們希望如此),但是我們確實(shí)看到了一個(gè)小問題。 當(dāng)我們查看Post變量時(shí),我們發(fā)現(xiàn)它高度不平衡。 我們有281天無帖子,只有64天有帖子。 在進(jìn)行更多分析之前,我們應(yīng)該重新平衡我們的數(shù)據(jù)集,以確保結(jié)果不偏斜。 我將通過從較大的組(無帖子的日子)中進(jìn)行抽樣來重新平衡數(shù)據(jù),這被稱為欠抽樣。 我還將設(shè)置一個(gè)隨機(jī)種子,以便我們的數(shù)字可重現(xiàn)。
set.seed(15) zeros = data_PY[data_PY$Post == 0,] samples = sample(281, size = (345-281), replace = FALSE) zeros = zeros[samples, ] balanced = rbind(zeros, data_PY[data_PY$Post == 1,]) summary(balanced$Post)## 0 1 ## 64 64Perfect, now our data is balanced. We should also do some EDA on our dependent variable (Daily Sales). It’s a good idea to know what our distribution looks like and if we have outliers we should address.
完美,現(xiàn)在我們的數(shù)據(jù)已經(jīng)平衡了。 我們還應(yīng)該對(duì)我們的因變量(每日銷售)進(jìn)行一些EDA。 知道我們的分布是什么樣子是一個(gè)好主意,如果有異常值我們應(yīng)該解決。
hist(balanced$Sales)boxplot(balanced$Sales)We can see our data is slightly skewed. Sadly, with real-world data our data is never a perfect normal distribution… Luckily though, we appear to have no outliers in our boxplot. Now we can begin modeling. Since we’re interested in understanding the dynamics of the system and not actually classifying or predicting, we’ll use a standard regression model.
我們可以看到我們的數(shù)據(jù)略有傾斜。 遺憾的是,對(duì)于現(xiàn)實(shí)世界的數(shù)據(jù),我們的數(shù)據(jù)永遠(yuǎn)不會(huì)是完美的正態(tài)分布……但是幸運(yùn)的是,我們的箱線圖中似乎沒有異常值。 現(xiàn)在我們可以開始建模了。 由于我們有興趣了解系統(tǒng)的動(dòng)態(tài)特性,而不是實(shí)際進(jìn)行分類或預(yù)測,因此我們將使用標(biāo)準(zhǔn)回歸模型。
model1 <- lm(data=balanced, Sales ~ Post) summary(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -316.22 -114.73 -29.78 111.17 476.49 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3467.1 20.5 169.095 < 2e-16 *** ## Post1 77.9 29.0 2.687 0.00819 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 164 on 126 degrees of freedom ## Multiple R-squared: 0.05418, Adjusted R-squared: 0.04667 ## F-statistic: 7.218 on 1 and 126 DF, p-value: 0.008193gvlma(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 3467.1 77.9 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model1) ## ## Value p-value Decision ## Global Stat 8.351e+00 0.07952 Assumptions acceptable. ## Skewness 6.187e+00 0.01287 Assumptions NOT satisfied! ## Kurtosis 7.499e-01 0.38651 Assumptions acceptable. ## Link Function -1.198e-13 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.414e+00 0.23435 Assumptions acceptable.Using a standard linear model, we obtain a result that says, on average, a FaceBook post increases daily sales by $77.90. We can see, based on the t-statistic and p-value that this result is highly statistically significant. We can use the GVLMA feature to ensure our model passes the underlying OLS assumptions. Here, we see we pass on all levels except skewness. We already identified earlier that skewness may be a problem with our data. A common correction for skewness is a log transformation. Let’s transform our dependent variable and see if it helps. Note that this model (a log-lin model) will produce coefficients with different interpretations than our last model.
使用標(biāo)準(zhǔn)線性模型,我們得出的結(jié)果表明,平均而言,FaceBook發(fā)布使每日銷售額增加77.90美元。 根據(jù)t統(tǒng)計(jì)量和p值,我們可以看到此結(jié)果在統(tǒng)計(jì)上非常重要。 我們可以使用GVLMA功能來確保我們的模型通過基本的OLS假設(shè)。 在這里,我們看到除了偏斜度之外,我們都通過了所有級(jí)別。 前面我們已經(jīng)確定偏斜可能是我們數(shù)據(jù)的問題。 偏度的常見校正是對(duì)數(shù)變換。 讓我們轉(zhuǎn)換我們的因變量,看看是否有幫助。 請(qǐng)注意,此模型(對(duì)數(shù)線性模型)將產(chǎn)生與上一個(gè)模型具有不同解釋的系數(shù)。
model2 <- lm(data=balanced, log(Sales) ~ Post) summary(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.092228 -0.032271 -0.007508 0.032085 0.129686 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.150154 0.005777 1410.673 < 2e-16 *** ## Post1 0.021925 0.008171 2.683 0.00827 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.04622 on 126 degrees of freedom ## Multiple R-squared: 0.05406, Adjusted R-squared: 0.04655 ## F-statistic: 7.201 on 1 and 126 DF, p-value: 0.008266gvlma(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 8.15015 0.02193 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model2) ## ## Value p-value Decision ## Global Stat 7.101e+00 0.13063 Assumptions acceptable. ## Skewness 4.541e+00 0.03309 Assumptions NOT satisfied! ## Kurtosis 1.215e+00 0.27030 Assumptions acceptable. ## Link Function -6.240e-14 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.345e+00 0.24614 Assumptions acceptable.plot(model2)hist(log(balanced$Sales))Our second model produces another highly significant coefficient for the FaceBook post variable. Here we see that each post is associated with an average 2.19% increase in daily sales. Unfortunately, even our log transformation was unable to correct for the skewness in our data. We’ll have to note this when presenting our findings later. Let’s now determine how much the 2.19% is actually worth (since we saw in model 1 that a post was worth $77.90).
我們的第二個(gè)模型為FaceBook post變量生成了另一個(gè)非常重要的系數(shù)。 在這里,我們看到每個(gè)帖子都與日均銷售額平均增長2.19%相關(guān)。 不幸的是,即使我們的對(duì)數(shù)轉(zhuǎn)換也無法糾正數(shù)據(jù)中的偏斜。 稍后介紹我們的發(fā)現(xiàn)時(shí),我們必須注意這一點(diǎn)。 現(xiàn)在讓我們確定2.19%的實(shí)際價(jià)值是多少(因?yàn)槲覀冊谀P?中看到,某帖子的價(jià)值為77.90美元)。
mean_sales_no_post <- mean(balanced$Sales[balanced$Post == 0]) mean_sales_with_post <- mean(balanced$Sales[balanced$Post == 1]) mean_sales_no_post * model1$coefficients['Post1']## Post1 ## 270086.1Very close. Model 2’s coefficient equates to $76.02, which is very similar to our 77.90. Let’s now run another test to see if we get similar results. In analytics, it’s always helpful to arrive at the same conclusion via different means, if possible. This helps solidify our results. Here we can run a standard T-test. Yes, yes, for those other analyst reading, a t-test is the same metric used in the OLS (hence the t-statistic it produces). Here, however, let’s run it on the unbalanced dataset to ensure we didn’t miss anything in sampling our data (perhaps we sampled really good or really bad data that will skew our results).
很接近。 模型2的系數(shù)等于$ 76.02,與我們的77.90非常相似。 現(xiàn)在讓我們運(yùn)行另一個(gè)測試,看看是否獲得相似的結(jié)果。 在分析中,如果可能的話,通過不同的方法得出相同的結(jié)論總是有幫助的。 這有助于鞏固我們的結(jié)果。 在這里,我們可以運(yùn)行標(biāo)準(zhǔn)的T檢驗(yàn)。 是的,是的,對(duì)于其他分析師而言,t檢驗(yàn)與OLS中使用的度量相同(因此它會(huì)產(chǎn)生t統(tǒng)計(jì)量)。 但是,在這里,讓我們在不平衡的數(shù)據(jù)集上運(yùn)行它,以確保我們在采樣數(shù)據(jù)時(shí)不會(huì)遺漏任何東西(也許我們采樣的是好數(shù)據(jù)還是壞數(shù)據(jù)會(huì)歪曲我們的結(jié)果)。
t_test <- t.test(data_PY$Sales[data_PY$Post == 1],data_PY$Sales[data_PY$Post == 0] ) t_test## ## Welch Two Sample t-test ## ## data: data_PY$Sales[data_PY$Post == 1] and data_PY$Sales[data_PY$Post == 0] ## t = 2.5407, df = 89.593, p-value = 0.01278 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 13.3264 108.9259 ## sample estimates: ## mean of x mean of y ## 3544.970 3483.844summary(t_test)## Length Class Mode ## statistic 1 -none- numeric ## parameter 1 -none- numeric ## p.value 1 -none- numeric ## conf.int 2 -none- numeric ## estimate 2 -none- numeric ## null.value 1 -none- numeric ## stderr 1 -none- numeric ## alternative 1 -none- character ## method 1 -none- character ## data.name 1 -none- characterggplot(data = data_PY, aes(Post, Sales, color = Post)) + geom_boxplot() + geom_jitter()Again, we receive a promising result. On all of the data, our t-statistic was 2.54, meaning we reject the null hypothesis that the difference in the means between the two groups is 0. Our T-test produces a confidence interval of [13.33, 89.59]. This interval includes our previous findings, once again giving us some added confidence. So now we know our FaceBook posts are actually benefiting the business by generating additional daily sales. However, we also saw earlier that our company hasn’t been posting reguarly (which is why we had to rebalance the data).
再次,我們收到了可喜的結(jié)果。 在所有數(shù)據(jù)上,我們的t統(tǒng)計(jì)量為2.54,這意味著我們拒絕零假設(shè),即兩組之間的均值差為0。我們的T檢驗(yàn)得出的置信區(qū)間為[13.33,89.59]。 這個(gè)間隔包括我們以前的發(fā)現(xiàn),再次給了我們更多的信心。 因此,現(xiàn)在我們知道我們的FaceBook帖子實(shí)際上通過產(chǎn)生額外的每日銷售額而使業(yè)務(wù)受益。 但是,我們還早些時(shí)候看到我們的公司并沒有進(jìn)行過定期過賬(這就是我們必須重新平衡數(shù)據(jù)的原因)。
length(data_PY$Sales[data_PY$Post == 0])## [1] 281ggplot(data = data.frame(post=as.factor(c('No Post','Post')), m=c(280, 54)) ,aes(x=post, y=m)) + geom_bar(stat='identity', fill='dodgerblue3')Let’s create a function that takes two inputs: 1) % of additional days advertised, 2) % of advertisements that were effective. The reason for the second argument is that it’s likely unrealistic to assume all over our ads are effective. Indeed, it’s likely we have diminishing returns with more posts (as people probably tire of seeing them or block them in their feed if they become too frequent). Limiting effectiveness will give us some sense of a more reasonable estimate of lost revenue. Another benefit of creating a custom function is that we can quickly re-run the calculations if management desires.
讓我們創(chuàng)建一個(gè)接受兩個(gè)輸入的函數(shù):1)額外廣告投放天數(shù)的百分比,2)有效廣告投放的百分比。 第二種說法的原因是,假設(shè)我們所有的廣告都有效是不現(xiàn)實(shí)的。 確實(shí),我們可能會(huì)通過發(fā)布更多的帖子來減少報(bào)酬(因?yàn)槿藗兛赡軙?huì)厭倦看到它們,或者如果它們變得太頻繁就會(huì)阻止它們進(jìn)入供稿)。 限制有效性將使我們對(duì)損失的收入進(jìn)行更合理的估算。 創(chuàng)建自定義函數(shù)的另一個(gè)好處是,如果管理層需要,我們可以快速重新運(yùn)行計(jì)算。
#construct a 95% confidence interval around Post1 coefficient conf_interval = confint(model2, "Post1", .90) missed_revenue <- function(pct_addlt_adv, pct_effective){ min = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[1] mean = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * model2$coefficients['Post1'] max = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[2] print(paste(pct_addlt_adv * 280, "additional days of advertising")) sprintf("$%.2f -- $%.2f -- $%.2f",max(min, 0), mean, max } #Missed_revenue(% of additional days advertised, % of advertisements that were effective) missed_revenue(.5, .7)## [1] "140 additional days of advertising"## [1] "$2849.38 -- $7449.57 -- $12049.75"So if our company had advertised half of the days they didn’t, and only 70% of those adds were effective, we’d have missed out on an average of $7,449.57.
因此,如果我們的公司在一半的時(shí)間里沒有刊登廣告,而其中只有70%的廣告有效,那么我們平均會(huì)錯(cuò)失$ 7,449.57。
Originally published at http://lowhangingfruitanalytics.com on August 21, 2020.
最初于 2020年8月21日 發(fā)布在 http://lowhangingfruitanalytics.com 上。
翻譯自: https://medium.com/the-innovation/facebook-advertising-analysis-3bedca07d7fe
接facebook廣告
總結(jié)
以上是生活随笔為你收集整理的接facebook广告_Facebook广告分析的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 考试前梦到什么最吉利
- 下一篇: 梦到什么可以发财