當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Capital one TPS整理

發(fā)布時間：2025/5/22 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 Capital one TPS整理小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Credit Card Fraud Detection 7 times from 2015 to 2017

What machine learning model would you use to classify fraudulent transactions on credit cards?

feature selection

how to use classification method, which one is good to use?Later there will also be a problem which method is the least useful.?

bias variance trade off -?What does regularization do?

target missing

false positive/false negative -?Are false positives or false negatives more important??What is the effect of FP and FN?

What is VIF (in regression output)?

potential issues

exploratory analysis and data cleaning

How would you handle missing or garbage data?

How would you use existing features to add new features?

Logistic regression, random forests

Difference between random forest and gradient boosted tree.

Anomaly detection/novelty detection techniques might be also helpful because of the huge data imbalance that normally exists in such scenarios.

Asked a lot of possible problems with the model and how should you deal with that when time?is limited.

Couple things to keep in mind regarding fraud:
1) you're dealing with an imbalanced data set (your fraud cases may be 3-5% of all your data). So, consider either oversampling, or giving higher weight to your fraud cases.
2) you data may not have all the true fraud cases - in other words, there maybe actual fraud cases not captured in your data. So, some form of anomaly detection may be needed.

預(yù)測用戶是否會注銷信用卡 -3 times in 2018

如果給你一堆dataset，比如信用卡一年的交易記錄、客戶個人信息，銀行想預(yù)測客戶會不會在一個月之內(nèi)關(guān)戶，如果會的話，銀行打算發(fā)一點cashback rewards給這些人挽留一下。讓你建模預(yù)關(guān)戶。??以下是面試官的問題：

1.? ? ? ? 你會選哪些feature？（感覺是隨便說，只要有關(guān)系。追問如果是一堆transaction的日期之類的，應(yīng)該怎樣rebuild feature）
2.? ? ? ? 怎么做data cleaning：?
? ? a.? ? ? ?? ???怎樣detect outlier？. From 1point 3acres bbs
? ? b.? ? ? ?? ???怎樣fill in missing data？(我說可以填constant比如mean，然后他追問填mean在什么情況下不合適、怎樣更好)
? ? c.? ? ? ?? ???如果target value也missing了怎么辦
3.? ? ? ? 你選什么model？(我說decision tree，然后他讓我說有沒有其他model，優(yōu)缺點分別是什么，target是什么。target應(yīng)該是一個binary的值whether the customer will close the account in one month，如果regression得到了0~1之間的值就代表how likely)
4.? ? ? ? 怎么看model 的performance，用什么package. From 1point 3acres bbs
5.? ? ? ? 如果data size很大有1TB，怎樣sample，用什么package. From 1point 3acres bbs
6.? ? ? ? 如果model不準(zhǔn)確，會給銀行造成什么損失？
7.? ? ? ? 如果用model predict得到了一堆target的值，應(yīng)該怎樣根據(jù)target發(fā)rewards (我說畫個distribution，給最可能關(guān)戶的百分之幾客戶發(fā)rewards。追問除了這種方式還有什么方式，我也不確定是考modeling還是business sense)
8.? ? ? ? 最后一個是地里看到的一模一樣的open question，兩人都有5000limit，但是一個用100%一個只用2%，這兩人有沒有可能都在一月之內(nèi)關(guān)戶。面試官應(yīng)該看你第一反應(yīng)是考慮model的問題還是考慮其他方面。

從feature engineering 到最后 model tuning and validation 的所有步驟。

如何建model,用了哪些parameter,結(jié)果如何還有為什么要選這個model

credit card churn model
? ?? ?1. Feature engineering，比如從start date算出tenure 等等
? ?? ?2. Missing value
? ?? ?3. 用什么模型，為什么
? ?? ?4. 現(xiàn)在數(shù)據(jù)量加大，怎么辦？spark。如果你要選，用RSpark還是PySpark？為什么
? ?? ?5. 現(xiàn)在模型output出來，一個credit limit 使用率0%的用戶和使用率95%的用戶都很危險，都很可能馬上就關(guān)掉信用卡，你會怎么處理？我回答churn model是起點，一般marketing department會根據(jù)churn model的結(jié)果設(shè)計retention program。對于這兩類危險用戶，需要設(shè)計不同的incentive plan。
? ?? ?? ?? ? 1）使用率0%的用戶，基本上很難挽回。
? ?? ?? ?? ? 2）使用率95%的用戶大概率可以挽回，降低利率，增加cashback等等。。。
? ?? ?? ?? ? 3）可以根據(jù)測試結(jié)果再搞個uplift model，看哪些high churn users可以挽回的，著重施加treatment。

tell me some useful packages you use in R/python? ?1 Answer
how do you detect multicollinearity? ?1 Answer
how do you join two data sets???

總結(jié)

以上是生活随笔為你收集整理的Capital one TPS整理的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

TPS
Capital

上一篇： QTableview 获取鼠标坐标的it
下一篇： 1016.XXE漏洞攻防学习

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Capital one TPS整理

總結(jié)