建立神经网络来预测贷款风险
深度分析 (In-Depth Analysis)
Introduction
介紹
Data cleaning
數據清理
Building the neural networks
建立神經網絡
Saving the final model
保存最終模型
Building the API
構建API
介紹 (Introduction)
LendingClub is the world’s largest peer-to-peer lending platform. Until recently (through the end of 2018), LendingClub published a public dataset of all loans issued since the company’s launch in 2007. I’m accessing the dataset via Kaggle.
LendingClub是世界上最大的點對點借貸平臺。 直到最近(到2018年底),LendingClub都發布了自該公司于2007年成立以來發行的所有貸款的公開數據集。我正在通過Kaggle訪問該數據集。
(2260701, 151)With 2,260,701 loans to look at and 151 potential variables, my goal is to create a neural network model with TensorFlow and Keras to predict the fraction of an expected loan return that a prospective borrower will pay back. This will require a lot of data cleaning given the state of the dataset, and I’ll walk through that entire process here. After building and training the network, I’ll create a public API to serve that model.
我需要查看2,260,701筆貸款和151個潛在變量,我的目標是使用TensorFlow和Keras創建一個神經網絡模型,以預測潛在借款人將償還的預期貸款回報的比例。 給定數據集的狀態,這將需要大量數據清理,在此我將逐步介紹整個過程。 在構建并訓練了網絡之后,我將創建一個公共API來服務于該模型。
Also, as you may have guessed from the preceding code block, this post is adapted from a Jupyter Notebook. If you’d like to follow along in your own notebook, go ahead and fork mine on Kaggle or GitHub.
另外,正如您可能從前面的代碼塊中猜到的那樣,此文章改編自Jupyter Notebook。 如果您想繼續使用自己的筆記本,請繼續在Kaggle或GitHub上進行挖掘。
數據清理 (Data cleaning)
I’ll first look at the data dictionary (downloaded directly from LendingClub’s website) to get an idea of how to create the desired output variable and which remaining features are available at the point of loan application (to avoid data leakage).
我將首先查看數據字典(直接從LendingClub的網站下載),以了解如何創建所需的輸出變量以及在貸款申請時可以使用哪些其余功能(以避免數據泄漏)。
?id: A unique LC assigned ID for the loan listing.?member_id: A unique LC assigned Id for the borrower member.
?loan_amnt: The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
?funded_amnt: The total amount committed to that loan at that point in time.
?funded_amnt_inv: The total amount committed by investors for that loan at that point in time.
?term: The number of payments on the loan. Values are in months and can be either 36 or 60.
?int_rate: Interest Rate on the loan
?installment: The monthly payment owed by the borrower if the loan originates.
?grade: LC assigned loan grade
?sub_grade: LC assigned loan subgrade
?emp_title: The job title supplied by the Borrower when applying for the loan.*
?emp_length: Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
?home_ownership: The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER
?annual_inc: The self-reported annual income provided by the borrower during registration.
?verification_status: Indicates if income was verified by LC, not verified, or if the income source was verified
?issue_d: The month which the loan was funded
?loan_status: Current status of the loan
?pymnt_plan: Indicates if a payment plan has been put in place for the loan
?url: URL for the LC page with listing data.
?desc: Loan description provided by the borrower
?purpose: A category provided by the borrower for the loan request.
?title: The loan title provided by the borrower
?zip_code: The first 3 numbers of the zip code provided by the borrower in the loan application.
?addr_state: The state provided by the borrower in the loan application
?dti: A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
?delinq_2yrs: The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years
?earliest_cr_line: The month the borrower's earliest reported credit line was opened
?fico_range_low: The lower boundary range the borrower’s FICO at loan origination belongs to.
?fico_range_high: The upper boundary range the borrower’s FICO at loan origination belongs to.
?inq_last_6mths: The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
?mths_since_last_delinq: The number of months since the borrower's last delinquency.
?mths_since_last_record: The number of months since the last public record.
?open_acc: The number of open credit lines in the borrower's credit file.
?pub_rec: Number of derogatory public records
?revol_bal: Total credit revolving balance
?revol_util: Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
?total_acc: The total number of credit lines currently in the borrower's credit file
?initial_list_status: The initial listing status of the loan. Possible values are – W, F
?out_prncp: Remaining outstanding principal for total amount funded
?out_prncp_inv: Remaining outstanding principal for portion of total amount funded by investors
?total_pymnt: Payments received to date for total amount funded
?total_pymnt_inv: Payments received to date for portion of total amount funded by investors
?total_rec_prncp: Principal received to date
?total_rec_int: Interest received to date
?total_rec_late_fee: Late fees received to date
?recoveries: post charge off gross recovery
?collection_recovery_fee: post charge off collection fee
?last_pymnt_d: Last month payment was received
?last_pymnt_amnt: Last total payment amount received
?next_pymnt_d: Next scheduled payment date
?last_credit_pull_d: The most recent month LC pulled credit for this loan
?last_fico_range_high: The upper boundary range the borrower’s last FICO pulled belongs to.
?last_fico_range_low: The lower boundary range the borrower’s last FICO pulled belongs to.
?collections_12_mths_ex_med: Number of collections in 12 months excluding medical collections
?mths_since_last_major_derog: Months since most recent 90-day or worse rating
?policy_code: publicly available policy_code=1
new products not publicly available policy_code=2
?application_type: Indicates whether the loan is an individual application or a joint application with two co-borrowers
?annual_inc_joint: The combined self-reported annual income provided by the co-borrowers during registration
?dti_joint: A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income
?verification_status_joint: Indicates if the co-borrowers' joint income was verified by LC, not verified, or if the income source was verified
?acc_now_delinq: The number of accounts on which the borrower is now delinquent.
?tot_coll_amt: Total collection amounts ever owed
?tot_cur_bal: Total current balance of all accounts
?open_acc_6m: Number of open trades in last 6 months
?open_act_il: Number of currently active installment trades
?open_il_12m: Number of installment accounts opened in past 12 months
?open_il_24m: Number of installment accounts opened in past 24 months
?mths_since_rcnt_il: Months since most recent installment accounts opened
?total_bal_il: Total current balance of all installment accounts
?il_util: Ratio of total current balance to high credit/credit limit on all install acct
?open_rv_12m: Number of revolving trades opened in past 12 months
?open_rv_24m: Number of revolving trades opened in past 24 months
?max_bal_bc: Maximum current balance owed on all revolving accounts
?all_util: Balance to credit limit on all trades
?total_rev_hi_lim: Total revolving high credit/credit limit
?inq_fi: Number of personal finance inquiries
?total_cu_tl: Number of finance trades
?inq_last_12m: Number of credit inquiries in past 12 months
?acc_open_past_24mths: Number of trades opened in past 24 months.
?avg_cur_bal: Average current balance of all accounts
?bc_open_to_buy: Total open to buy on revolving bankcards.
?bc_util: Ratio of total current balance to high credit/credit limit for all bankcard accounts.
?chargeoff_within_12_mths: Number of charge-offs within 12 months
?delinq_amnt: The past-due amount owed for the accounts on which the borrower is now delinquent.
?mo_sin_old_il_acct: Months since oldest bank installment account opened
?mo_sin_old_rev_tl_op: Months since oldest revolving account opened
?mo_sin_rcnt_rev_tl_op: Months since most recent revolving account opened
?mo_sin_rcnt_tl: Months since most recent account opened
?mort_acc: Number of mortgage accounts.
?mths_since_recent_bc: Months since most recent bankcard account opened.
?mths_since_recent_bc_dlq: Months since most recent bankcard delinquency
?mths_since_recent_inq: Months since most recent inquiry.
?mths_since_recent_revol_delinq: Months since most recent revolving delinquency.
?num_accts_ever_120_pd: Number of accounts ever 120 or more days past due
?num_actv_bc_tl: Number of currently active bankcard accounts
?num_actv_rev_tl: Number of currently active revolving trades
?num_bc_sats: Number of satisfactory bankcard accounts
?num_bc_tl: Number of bankcard accounts
?num_il_tl: Number of installment accounts
?num_op_rev_tl: Number of open revolving accounts
?num_rev_accts: Number of revolving accounts
?num_rev_tl_bal_gt_0: Number of revolving trades with balance >0
?num_sats: Number of satisfactory accounts
?num_tl_120dpd_2m: Number of accounts currently 120 days past due (updated in past 2 months)
?num_tl_30dpd: Number of accounts currently 30 days past due (updated in past 2 months)
?num_tl_90g_dpd_24m: Number of accounts 90 or more days past due in last 24 months
?num_tl_op_past_12m: Number of accounts opened in past 12 months
?pct_tl_nvr_dlq: Percent of trades never delinquent
?percent_bc_gt_75: Percentage of all bankcard accounts > 75% of limit.
?pub_rec_bankruptcies: Number of public record bankruptcies
?tax_liens: Number of tax liens
?tot_hi_cred_lim: Total high credit/credit limit
?total_bal_ex_mort: Total credit balance excluding mortgage
?total_bc_limit: Total bankcard high credit/credit limit
?total_il_high_credit_limit: Total installment high credit/credit limit
?revol_bal_joint: Sum of revolving credit balance of the co-borrowers, net of duplicate balances
?sec_app_fico_range_low: FICO range (high) for the secondary applicant
?sec_app_fico_range_high: FICO range (low) for the secondary applicant
?sec_app_earliest_cr_line: Earliest credit line at time of application for the secondary applicant
?sec_app_inq_last_6mths: Credit inquiries in the last 6 months at time of application for the secondary applicant
?sec_app_mort_acc: Number of mortgage accounts at time of application for the secondary applicant
?sec_app_open_acc: Number of open trades at time of application for the secondary applicant
?sec_app_revol_util: Ratio of total current balance to high credit/credit limit for all revolving accounts
?sec_app_open_act_il: Number of currently active installment trades at time of application for the secondary applicant
?sec_app_num_rev_accts: Number of revolving accounts at time of application for the secondary applicant
?sec_app_chargeoff_within_12_mths: Number of charge-offs within last 12 months at time of application for the secondary applicant
?sec_app_collections_12_mths_ex_med: Number of collections within last 12 months excluding medical collections at time of application for the secondary applicant
?sec_app_mths_since_last_major_derog: Months since most recent 90-day or worse rating at time of application for the secondary applicant
?hardship_flag: Flags whether or not the borrower is on a hardship plan
?hardship_type: Describes the hardship plan offering
?hardship_reason: Describes the reason the hardship plan was offered
?hardship_status: Describes if the hardship plan is active, pending, canceled, completed, or broken
?deferral_term: Amount of months that the borrower is expected to pay less than the contractual monthly payment amount due to a hardship plan
?hardship_amount: The interest payment that the borrower has committed to make each month while they are on a hardship plan
?hardship_start_date: The start date of the hardship plan period
?hardship_end_date: The end date of the hardship plan period
?payment_plan_start_date: The day the first hardship plan payment is due. For example, if a borrower has a hardship plan period of 3 months, the start date is the start of the three-month period in which the borrower is allowed to make interest-only payments.
?hardship_length: The number of months the borrower will make smaller payments than normally obligated due to a hardship plan
?hardship_dpd: Account days past due as of the hardship plan start date
?hardship_loan_status: Loan Status as of the hardship plan start date
?orig_projected_additional_accrued_interest: The original projected additional interest amount that will accrue for the given hardship payment plan as of the Hardship Start Date. This field will be null if the borrower has broken their hardship payment plan.
?hardship_payoff_balance_amount: The payoff balance amount as of the hardship plan start date
?hardship_last_payment_amount: The last payment amount as of the hardship plan start date
?disbursement_method: The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY
?debt_settlement_flag: Flags whether or not the borrower, who has charged-off, is working with a debt-settlement company.
?debt_settlement_flag_date: The most recent date that the Debt_Settlement_Flag has been set
?settlement_status: The status of the borrower’s settlement plan. Possible values are: COMPLETE, ACTIVE, BROKEN, CANCELLED, DENIED, DRAFT
?settlement_date: The date that the borrower agrees to the settlement plan
?settlement_amount: The loan amount that the borrower has agreed to settle for
?settlement_percentage: The settlement amount as a percentage of the payoff balance amount on the loan
?settlement_term: The number of months that the borrower will be on the settlement plan
For the output variable (the fraction of expected return that was recovered), I’ll calculated the expected return by multiplying the monthly payment amount (installment) by the number of payments on the loan (term), and I’ll calculate the amount actually received by summing the total principle, interest, late fees, and post-chargeoff gross recovery received (total_rec_prncp, total_rec_int, total_rec_late_fee, recoveries) and subtracting any collection fee (collection_recovery_fee).
對于輸出變量(收回的預期收益的比例),我將每月還款額( installment )乘以貸款的還款次數( term )來計算出預期收益 ,然后計算出該金額其實總結的總原則,利息,滯納金和后chargeoff收到總回收率( 收到 total_rec_prncp , total_rec_int , total_rec_late_fee , recoveries )并減去任何費征收( collection_recovery_fee )。
Several other columns contain either irrelevant demographic data or data not created until after a loan is accepted, so those will need to be removed. I’ll hold onto issue_d (the month and year the loan was funded) for now, though, in case I want to compare variables to the date of the loan.
其他幾列包含不相關的人口統計數據或直到接受貸款后才創建的數據,因此需要將其刪除。 不過,如果我想將變量與貸款日期進行比較,我暫時保留issue_d (貸款資金的年月)。
emp_title (the applicant’s job title) does seem relevant in the context of a loan, but it may have too many unique values to be useful.
emp_title (申請人的職務)在貸款方面似乎確實相關,但是它可能具有太多獨特的值,無法使用。
512694Too many unique values indeed. In a future version of this model, I could perhaps try to generate a feature from this column by aggregating job titles into categories, but that effort may have a low return on investment, since there are already columns for annual income and length of employment.
確實有太多獨特的價值。 在此模型的未來版本中,我也許可以嘗試通過將職稱匯總到類別中來從此列中生成功能,但是這種努力可能會降低投資回報率,因為已經有用于年收入和就業時間的列。
Two other interesting columns that I’ll also remove are title and desc (“description”), which are both freeform text entries written by the borrower. These could be fascinating subjects for natural language processing, but that’s outside the scope of the current project. Perhaps in the future, I could generate additional features from these fields using measures like syntactic complexity, word count, or keyword inclusion.
我還將刪除的另外兩個有趣的列是title和desc (“描述”),它們都是借款人編寫的自由格式文本條目。 這些可能是自然語言處理的有趣主題,但這不在當前項目的范圍內。 也許將來,我可以使用語法復雜性,字數統計或關鍵字包含之類的方法從這些字段中生成其他功能。
Before creating the output variable, however, I must take a closer look at loan_status, to see if any loans in the dataset are still open.
但是,在創建輸出變量之前,我必須仔細查看loan_status ,以查看數據集中是否還有任何借貸。
loan_statusCharged Off 268559
Current 878317
Default 40
Does not meet the credit policy. Status:Charged Off 761
Does not meet the credit policy. Status:Fully Paid 1988
Fully Paid 1076751
In Grace Period 8436
Late (16-30 days) 4349
Late (31-120 days) 21467
Name: loan_status, dtype: int64
For practical purposes, I’ll consider loans with statuses that don’t contain “Fully Paid” or “Charged Off” to still be open, so I’ll remove those from the dataset. I’ll also merge the “credit policy” columns with their matching status.
出于實際目的,我將認為狀態不包含“已付清”或“已清還”的貸款仍處于打開狀態,因此我將從數據集中刪除這些貸款。 我還將合并“信貸政策”列及其匹配狀態。
loan_statusCharged Off 269320
Fully Paid 1078739
Name: loan_status, dtype: int64
Now to create the output variable. I’ll start by checking the null counts of the variables involved.
現在創建輸出變量。 我將從檢查所涉及變量的空計數開始。
<class 'pandas.core.frame.DataFrame'>Int64Index: 1348059 entries, 0 to 2260697
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 term 1348059 non-null object
1 installment 1348059 non-null float64
2 total_rec_prncp 1348059 non-null float64
3 total_rec_int 1348059 non-null float64
4 total_rec_late_fee 1348059 non-null float64
5 recoveries 1348059 non-null float64
6 collection_recovery_fee 1348059 non-null float64
dtypes: float64(6), object(1)
memory usage: 82.3+ MB
Every remaining row has each of these seven variables, but term’s data type is object, so that needs to be fixed first.
其余的每一行都有這七個變量,但是term的數據類型是object ,因此需要首先固定。
term36 months 1023181
60 months 324878
Name: term, dtype: int64
Ah, so term is a categorical feature with two options. I’ll treat it as such when I use it as an input to the model, but to calculate the output variable I’ll create a numerical column from it.
嗯, term是帶有兩個選項的分類功能。 當我將其用作模型的輸入時,將對其進行處理,但是要計算輸出變量,我將根據該值創建一個數字列。
Also, I need to trim the whitespace from the beginning of those values — that’s no good.
另外,我需要從這些值的開頭修剪空白-這是不好的。
Now I can create the output variable.
現在,我可以創建輸出變量。
There is at least one odd outlier on the right in both categories. But also, many of the “fully paid” loans do not quite reach 1. One potential explanation is that when the last payment comes in, the system just flips loan_status to “Fully Paid” without adding the payment amount to the system itself, or perhaps simply multiplying installation by the term number leaves off a few cents in the actual total. If I were performing this analysis for Lending Club themselves, I’d ask them, but this is just a personal project. I’ll consider every loan marked “Fully Paid” to have fully recovered the expected return.
在這兩類中,至少有一個奇異的異常值在右邊。 但是,許多“已付清”貸款還沒有達到1。一個可能的解釋是,當最后一次loan_status ,系統只是將loan_status翻轉為“已全額支付”,而未將支付金額添加到系統本身,或者也許僅將installation數乘以term數就可以使實際總數減少幾美分。 如果我是為Lending Club自己進行分析,我會問他們,但這只是一個個人項目。 我認為每筆標有“已付清”的貸款都已完全收回了預期收益。
For that matter, I’ll cap my fraction_recovered values for charged off loans at 1.0 as well, since at least one value is above that for some reason.
為此,我還將沖銷貸款的fraction_recovered值也限制為1.0,因為出于某種原因至少有一個值高于該值。
For the sake of curiosity, I’ll plot the distribution of fraction recovered for charged-off loans.
出于好奇,我將為沖銷貸款繪制回收分數的分布。
Now that the output is formatted, it’s time to clean up the inputs. I’ll check the null counts of each variable.
現在已經格式化了輸出,是時候清理輸入了。 我將檢查每個變量的空計數。
<class 'pandas.core.frame.DataFrame'>Int64Index: 1348059 entries, 0 to 2260697
Data columns (total 97 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 1348059 non-null float64
1 term 1348059 non-null object
2 emp_length 1269514 non-null object
3 home_ownership 1348059 non-null object
4 annual_inc 1348055 non-null float64
5 verification_status 1348059 non-null object
6 issue_d 1348059 non-null object
7 loan_status 1348059 non-null object
8 purpose 1348059 non-null object
9 dti 1347685 non-null float64
10 delinq_2yrs 1348030 non-null float64
11 earliest_cr_line 1348030 non-null object
12 fico_range_low 1348059 non-null float64
13 fico_range_high 1348059 non-null float64
14 inq_last_6mths 1348029 non-null float64
15 mths_since_last_delinq 668117 non-null float64
16 mths_since_last_record 229415 non-null float64
17 open_acc 1348030 non-null float64
18 pub_rec 1348030 non-null float64
19 revol_bal 1348059 non-null float64
20 revol_util 1347162 non-null float64
21 total_acc 1348030 non-null float64
22 collections_12_mths_ex_med 1347914 non-null float64
23 mths_since_last_major_derog 353750 non-null float64
24 application_type 1348059 non-null object
25 annual_inc_joint 25800 non-null float64
26 dti_joint 25797 non-null float64
27 verification_status_joint 25595 non-null object
28 acc_now_delinq 1348030 non-null float64
29 tot_coll_amt 1277783 non-null float64
30 tot_cur_bal 1277783 non-null float64
31 open_acc_6m 537597 non-null float64
32 open_act_il 537598 non-null float64
33 open_il_12m 537598 non-null float64
34 open_il_24m 537598 non-null float64
35 mths_since_rcnt_il 523382 non-null float64
36 total_bal_il 537598 non-null float64
37 il_util 465016 non-null float64
38 open_rv_12m 537598 non-null float64
39 open_rv_24m 537598 non-null float64
40 max_bal_bc 537598 non-null float64
41 all_util 537545 non-null float64
42 total_rev_hi_lim 1277783 non-null float64
43 inq_fi 537598 non-null float64
44 total_cu_tl 537597 non-null float64
45 inq_last_12m 537597 non-null float64
46 acc_open_past_24mths 1298029 non-null float64
47 avg_cur_bal 1277761 non-null float64
48 bc_open_to_buy 1284167 non-null float64
49 bc_util 1283398 non-null float64
50 chargeoff_within_12_mths 1347914 non-null float64
51 delinq_amnt 1348030 non-null float64
52 mo_sin_old_il_acct 1239735 non-null float64
53 mo_sin_old_rev_tl_op 1277782 non-null float64
54 mo_sin_rcnt_rev_tl_op 1277782 non-null float64
55 mo_sin_rcnt_tl 1277783 non-null float64
56 mort_acc 1298029 non-null float64
57 mths_since_recent_bc 1285089 non-null float64
58 mths_since_recent_bc_dlq 319020 non-null float64
59 mths_since_recent_inq 1171239 non-null float64
60 mths_since_recent_revol_delinq 449962 non-null float64
61 num_accts_ever_120_pd 1277783 non-null float64
62 num_actv_bc_tl 1277783 non-null float64
63 num_actv_rev_tl 1277783 non-null float64
64 num_bc_sats 1289469 non-null float64
65 num_bc_tl 1277783 non-null float64
66 num_il_tl 1277783 non-null float64
67 num_op_rev_tl 1277783 non-null float64
68 num_rev_accts 1277782 non-null float64
69 num_rev_tl_bal_gt_0 1277783 non-null float64
70 num_sats 1289469 non-null float64
71 num_tl_120dpd_2m 1227909 non-null float64
72 num_tl_30dpd 1277783 non-null float64
73 num_tl_90g_dpd_24m 1277783 non-null float64
74 num_tl_op_past_12m 1277783 non-null float64
75 pct_tl_nvr_dlq 1277629 non-null float64
76 percent_bc_gt_75 1283755 non-null float64
77 pub_rec_bankruptcies 1346694 non-null float64
78 tax_liens 1347954 non-null float64
79 tot_hi_cred_lim 1277783 non-null float64
80 total_bal_ex_mort 1298029 non-null float64
81 total_bc_limit 1298029 non-null float64
82 total_il_high_credit_limit 1277783 non-null float64
83 revol_bal_joint 18629 non-null float64
84 sec_app_fico_range_low 18630 non-null float64
85 sec_app_fico_range_high 18630 non-null float64
86 sec_app_earliest_cr_line 18630 non-null object
87 sec_app_inq_last_6mths 18630 non-null float64
88 sec_app_mort_acc 18630 non-null float64
89 sec_app_open_acc 18630 non-null float64
90 sec_app_revol_util 18302 non-null float64
91 sec_app_open_act_il 18630 non-null float64
92 sec_app_num_rev_accts 18630 non-null float64
93 sec_app_chargeoff_within_12_mths 18630 non-null float64
94 sec_app_collections_12_mths_ex_med 18630 non-null float64
95 sec_app_mths_since_last_major_derog 6645 non-null float64
96 fraction_recovered 1348059 non-null float64
dtypes: float64(86), object(11)
memory usage: 1007.9+ MB
Remaining columns with lots of null values seem to fall into three categories:
剩下的帶有很多空值的列似乎可以分為三類:
Derogatory/delinquency metrics (where null means the borrower doesn’t have any such marks). I’ll also add mths_since_recent_inq to this list, since its non-null count is below what seems to be the threshold for complete data, which is around 1,277,783. I’ll assume a null value here means no recent inquiries.
貶損/拖欠行為指標 (其中null表示借款人沒有任何此類標記)。 我還將在該列表中添加mths_since_recent_inq ,因為其非空計數低于似乎是完整數據的閾值,約為1,277,783。 我假設這里為空值,意味著沒有最近的查詢。
Metrics that only apply to joint applications (where null means it was a single application).
僅適用于聯合應用程序的度量標準 (其中null表示它是單個應用程序)。
An inexplicable series of 14 credit history–related columns that only have around 537,000 entries. Are these newer metrics?
包含14個與信用記錄相關的列的莫名其妙的系列 ,只有大約537,000個條目。 這些是新指標嗎?
I’ll first look at those more confusing columns to find out whether or not they’re a newer set of metrics. That’ll require converting issue_d to date format first.
我將首先查看那些更令人困惑的列,以了解它們是否是一組較新的指標。 這將需要首先將issue_d轉換為日期格式。
count 464325min 2015-12-01 00:00:00
max 2018-12-01 00:00:00
Name: issue_d, dtype: objectcount 557708
min 2015-12-01 00:00:00
max 2018-12-01 00:00:00
Name: issue_d, dtype: object
It appears that these are indeed newer metrics, their use only beginning in December 2015, but even after that point usage is spotty. I’m curious to see if these additional metrics would make a model more accurate, though, so once I’m done cleaning the data I’ll copy the rows with these new metrics into a new dataset and create another model using the new metrics.
看來這些確實是較新的指標,它們的使用僅在2015年12月開始,但即使在此之后,使用情況仍然參差不齊。 我很好奇這些附加指標是否會使模型更準確,因此,一旦清理完數據,我會將具有這些新指標的行復制到新數據集中,并使用新指標創建另一個模型。
As for the derogatory/delinquency metrics, taking a cue from Michael Wurm, I’m going to take the inverse of all the “months since recent/last” fields, which will turn each into a proxy for the frequency of the event and also let me set all the null values (when an event has never happened) to 0. For the “months since oldest” fields, I’ll just set the null values to 0 and leave the rest untouched.
至于貶損/過失指標, 從邁克爾·烏爾姆 ( Michael Wurm)那里得到提示,我將采用所有“自最近/最近以來的月份”字段的倒數,這將使每個字段都代表事件的發生頻率,以及讓我將所有空值(如果從未發生過事件)都設置為0。對于“從最舊的月份開始”字段,我將空值設置為0,其余的保持不變。
Now to look closer at joint loans.
現在來看一下聯合貸款。
application_typeIndividual 1322259
Joint App 25800
Name: application_type, dtype: int64<class 'pandas.core.frame.DataFrame'>
Int64Index: 25800 entries, 2 to 2260663
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 annual_inc_joint 25800 non-null float64
1 dti_joint 25797 non-null float64
2 verification_status_joint 25595 non-null object
3 revol_bal_joint 18629 non-null float64
4 sec_app_fico_range_low 18630 non-null float64
5 sec_app_fico_range_high 18630 non-null float64
6 sec_app_earliest_cr_line 18630 non-null object
7 sec_app_inq_last_6mths 18630 non-null float64
8 sec_app_mort_acc 18630 non-null float64
9 sec_app_open_acc 18630 non-null float64
10 sec_app_revol_util 18302 non-null float64
11 sec_app_open_act_il 18630 non-null float64
12 sec_app_num_rev_accts 18630 non-null float64
13 sec_app_chargeoff_within_12_mths 18630 non-null float64
14 sec_app_collections_12_mths_ex_med 18630 non-null float64
15 sec_app_inv_mths_since_last_major_derog 25800 non-null float64
dtypes: float64(14), object(2)
memory usage: 3.3+ MB
It seems there may be a case of newer metrics for joint applications as well. I’ll investigate.
似乎也可能有一些針對聯合應用的更新指標。 我會調查
count 18301min 2017-03-01 00:00:00
max 2018-12-01 00:00:00
Name: issue_d, dtype: objectcount 18629
min 2017-03-01 00:00:00
max 2018-12-01 00:00:00
Name: issue_d, dtype: object
Newer than the previous set of new metrics, even — these didn’t start getting used till March 2017. Now I wonder when joint loans were first introduced.
甚至比以前的一組新指標都更新-這些指標直到2017年3月才開始使用。現在我想知道何時首次引入聯合貸款。
count 25800min 2015-10-01 00:00:00
max 2018-12-01 00:00:00
Name: issue_d, dtype: object
2015. I think I’ll save the newer joint metrics for perhaps a third model, but I believe I can include annual_inc_joint, dti_joint, and verification_status_joint in the main model—I’ll just binary-encode application_type, and for individual applications I’ll set annual_inc_joint, dti_joint, and verification_status_joint equal to their non-joint counterparts.
2015年,我想我會保存新的聯合度量也許是第三種模式,但我相信我可以包括annual_inc_joint , dti_joint和verification_status_joint中的主力機型-我只是二進制編碼application_type ,和應用程序的I”將annual_inc_joint , dti_joint和verification_status_joint設置為它們的非聯合副本。
<class 'pandas.core.frame.DataFrame'>Int64Index: 1348059 entries, 0 to 2260697
Data columns (total 97 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 1348059 non-null float64
1 term 1348059 non-null object
2 emp_length 1269514 non-null object
3 home_ownership 1348059 non-null object
4 annual_inc 1348055 non-null float64
5 verification_status 1348059 non-null object
6 issue_d 1348059 non-null datetime64[ns]
7 loan_status 1348059 non-null object
8 purpose 1348059 non-null object
9 dti 1347685 non-null float64
10 delinq_2yrs 1348030 non-null float64
11 earliest_cr_line 1348030 non-null object
12 fico_range_low 1348059 non-null float64
13 fico_range_high 1348059 non-null float64
14 inq_last_6mths 1348029 non-null float64
15 inv_mths_since_last_delinq 1348059 non-null float64
16 inv_mths_since_last_record 1348059 non-null float64
17 open_acc 1348030 non-null float64
18 pub_rec 1348030 non-null float64
19 revol_bal 1348059 non-null float64
20 revol_util 1347162 non-null float64
21 total_acc 1348030 non-null float64
22 collections_12_mths_ex_med 1347914 non-null float64
23 inv_mths_since_last_major_derog 1348059 non-null float64
24 application_type 1348059 non-null object
25 annual_inc_joint 1348055 non-null float64
26 dti_joint 1348056 non-null float64
27 verification_status_joint 1347854 non-null object
28 acc_now_delinq 1348030 non-null float64
29 tot_coll_amt 1277783 non-null float64
30 tot_cur_bal 1277783 non-null float64
31 open_acc_6m 537597 non-null float64
32 open_act_il 537598 non-null float64
33 open_il_12m 537598 non-null float64
34 open_il_24m 537598 non-null float64
35 inv_mths_since_rcnt_il 1348059 non-null float64
36 total_bal_il 537598 non-null float64
37 il_util 465016 non-null float64
38 open_rv_12m 537598 non-null float64
39 open_rv_24m 537598 non-null float64
40 max_bal_bc 537598 non-null float64
41 all_util 537545 non-null float64
42 total_rev_hi_lim 1277783 non-null float64
43 inq_fi 537598 non-null float64
44 total_cu_tl 537597 non-null float64
45 inq_last_12m 537597 non-null float64
46 acc_open_past_24mths 1298029 non-null float64
47 avg_cur_bal 1277761 non-null float64
48 bc_open_to_buy 1284167 non-null float64
49 bc_util 1283398 non-null float64
50 chargeoff_within_12_mths 1347914 non-null float64
51 delinq_amnt 1348030 non-null float64
52 mo_sin_old_il_acct 1239735 non-null float64
53 mo_sin_old_rev_tl_op 1277782 non-null float64
54 inv_mo_sin_rcnt_rev_tl_op 1348059 non-null float64
55 inv_mo_sin_rcnt_tl 1348059 non-null float64
56 mort_acc 1298029 non-null float64
57 inv_mths_since_recent_bc 1348059 non-null float64
58 inv_mths_since_recent_bc_dlq 1348059 non-null float64
59 inv_mths_since_recent_inq 1348059 non-null float64
60 inv_mths_since_recent_revol_delinq 1348059 non-null float64
61 num_accts_ever_120_pd 1277783 non-null float64
62 num_actv_bc_tl 1277783 non-null float64
63 num_actv_rev_tl 1277783 non-null float64
64 num_bc_sats 1289469 non-null float64
65 num_bc_tl 1277783 non-null float64
66 num_il_tl 1277783 non-null float64
67 num_op_rev_tl 1277783 non-null float64
68 num_rev_accts 1277782 non-null float64
69 num_rev_tl_bal_gt_0 1277783 non-null float64
70 num_sats 1289469 non-null float64
71 num_tl_120dpd_2m 1227909 non-null float64
72 num_tl_30dpd 1277783 non-null float64
73 num_tl_90g_dpd_24m 1277783 non-null float64
74 num_tl_op_past_12m 1277783 non-null float64
75 pct_tl_nvr_dlq 1277629 non-null float64
76 percent_bc_gt_75 1283755 non-null float64
77 pub_rec_bankruptcies 1346694 non-null float64
78 tax_liens 1347954 non-null float64
79 tot_hi_cred_lim 1277783 non-null float64
80 total_bal_ex_mort 1298029 non-null float64
81 total_bc_limit 1298029 non-null float64
82 total_il_high_credit_limit 1277783 non-null float64
83 revol_bal_joint 18629 non-null float64
84 sec_app_fico_range_low 18630 non-null float64
85 sec_app_fico_range_high 18630 non-null float64
86 sec_app_earliest_cr_line 18630 non-null object
87 sec_app_inq_last_6mths 18630 non-null float64
88 sec_app_mort_acc 18630 non-null float64
89 sec_app_open_acc 18630 non-null float64
90 sec_app_revol_util 18302 non-null float64
91 sec_app_open_act_il 18630 non-null float64
92 sec_app_num_rev_accts 18630 non-null float64
93 sec_app_chargeoff_within_12_mths 18630 non-null float64
94 sec_app_collections_12_mths_ex_med 18630 non-null float64
95 sec_app_inv_mths_since_last_major_derog 1348059 non-null float64
96 fraction_recovered 1348059 non-null float64
dtypes: datetime64[ns](1), float64(86), object(10)
memory usage: 1007.9+ MB
Now the only remaining steps should be removing rows with null values (in columns that aren’t new metrics) and encoding categorical features.
現在,剩下的唯一步驟應該是刪除具有空值的行(在不是新指標的列中)并編碼分類特征。
I’m removing rows with null values in those columns because that should still leave the vast majority of rows intact, over 1 million, which is still plenty of data. But I guess I should make sure before I overwrite loans.
我要刪除這些列中具有空值的行,因為那仍應保持絕大多數行(超過100萬行)的完好無損,而這仍然是大量數據。 但是我想我應該確保在覆蓋loans之前。
(1110171, 97)Yes, still 1,110,171. That’ll do.
是的,仍然是1,110,171。 會的
Then actually I’ll tackle earliest_cr_line and its joint counterpart first before looking at the categorical features.
然后,實際上,在查看分類特征之前,我將先解決earliest_cr_line及其聯合副本。
1110171 rows × 2 columnsI should convert that to the age of the credit line at the time of application (or the time of loan issuing, more precisely).
我應該將其轉換為申請時(或更確切地說,發放貸款時)的信貸額度。
0 1481 192
2 184
4 210
5 338
...
2260688 147
2260690 175
2260691 64
2260692 230
2260697 207
Length: 1110171, dtype: int64
Now a look at those categorical features.
現在看一下這些分類功能。
term36 months 831601
60 months 278570
Name: term, dtype: int64emp_length
1 year 76868
10+ years 392883
2 years 106124
3 years 93784
4 years 69031
5 years 72421
6 years 54240
7 years 52229
8 years 53826
9 years 45210
< 1 year 93555
Name: emp_length, dtype: int64home_ownership
ANY 250
MORTGAGE 559035
NONE 39
OTHER 40
OWN 114577
RENT 436230
Name: home_ownership, dtype: int64verification_status
Not Verified 335350
Source Verified 463153
Verified 311668
Name: verification_status, dtype: int64purpose
car 10754
credit_card 245942
debt_consolidation 653222
educational 1
home_improvement 71089
house 5720
major_purchase 22901
medical 12302
moving 7464
other 60986
renewable_energy 691
small_business 11137
vacation 7169
wedding 793
Name: purpose, dtype: int64verification_status_joint
Not Verified 341073
Source Verified 461941
Verified 307157
Name: verification_status_joint, dtype: int64
First, in researching income verification, I learned that LendingClub only tries to verify income on a subset of loan applications based on the content of the application, so this feature is a source of target leakage. I’ll remove the two offending columns (and a couple more I don’t need anymore).
首先,在研究收入驗證時,我了解到LendingClub僅嘗試根據應用程序的內容來驗證一部分貸款應用程序的收入 ,因此此功能是目標泄漏的根源。 我將刪除兩個有問題的列(還有一些我不再需要的列)。
Once I create my pipeline, I’ll binary encode term, one-hot encode home_ownership and purpose, and since emp_length is an ordinal variable, I’ll convert it to the integers 0–10.
一旦創建我的管道,我會二進制編碼term ,獨熱編碼home_ownership和purpose ,而且由于emp_length是一個序變量,我將其轉換為整數0-10。
That should cover all the cleaning necessary for the first model’s data. I’ll save the columns that’ll be used in the first model to a new DataFrame, and while I’m at it, I’ll start formatting the DataFrames for the two additional models adding the two sets of new metrics.
這應該包括對第一個模型的數據進行的所有必要清潔。 我將在第一個模型中使用的列保存到新的DataFrame中,當我使用它時,我將開始為兩個附加模型設置DataFrames格式,并添加兩組新指標。
<class 'pandas.core.frame.DataFrame'>Int64Index: 1110171 entries, 0 to 2260697
Data columns (total 80 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 1110171 non-null float64
1 term 1110171 non-null object
2 emp_length 1110171 non-null object
3 home_ownership 1110171 non-null object
4 annual_inc 1110171 non-null float64
5 purpose 1110171 non-null object
6 dti 1110171 non-null float64
7 delinq_2yrs 1110171 non-null float64
8 cr_hist_age_mths 1110171 non-null int64
9 fico_range_low 1110171 non-null float64
10 fico_range_high 1110171 non-null float64
11 inq_last_6mths 1110171 non-null float64
12 inv_mths_since_last_delinq 1110171 non-null float64
13 inv_mths_since_last_record 1110171 non-null float64
14 open_acc 1110171 non-null float64
15 pub_rec 1110171 non-null float64
16 revol_bal 1110171 non-null float64
17 revol_util 1110171 non-null float64
18 total_acc 1110171 non-null float64
19 collections_12_mths_ex_med 1110171 non-null float64
20 inv_mths_since_last_major_derog 1110171 non-null float64
21 application_type 1110171 non-null object
22 annual_inc_joint 1110171 non-null float64
23 dti_joint 1110171 non-null float64
24 acc_now_delinq 1110171 non-null float64
25 tot_coll_amt 1110171 non-null float64
26 tot_cur_bal 1110171 non-null float64
27 open_acc_6m 459541 non-null float64
28 open_act_il 459541 non-null float64
29 open_il_12m 459541 non-null float64
30 open_il_24m 459541 non-null float64
31 inv_mths_since_rcnt_il 1110171 non-null float64
32 total_bal_il 459541 non-null float64
33 il_util 408722 non-null float64
34 open_rv_12m 459541 non-null float64
35 open_rv_24m 459541 non-null float64
36 max_bal_bc 459541 non-null float64
37 all_util 459541 non-null float64
38 total_rev_hi_lim 1110171 non-null float64
39 inq_fi 459541 non-null float64
40 total_cu_tl 459541 non-null float64
41 inq_last_12m 459541 non-null float64
42 acc_open_past_24mths 1110171 non-null float64
43 avg_cur_bal 1110171 non-null float64
44 bc_open_to_buy 1110171 non-null float64
45 bc_util 1110171 non-null float64
46 chargeoff_within_12_mths 1110171 non-null float64
47 delinq_amnt 1110171 non-null float64
48 mo_sin_old_il_acct 1110171 non-null float64
49 mo_sin_old_rev_tl_op 1110171 non-null float64
50 inv_mo_sin_rcnt_rev_tl_op 1110171 non-null float64
51 inv_mo_sin_rcnt_tl 1110171 non-null float64
52 mort_acc 1110171 non-null float64
53 inv_mths_since_recent_bc 1110171 non-null float64
54 inv_mths_since_recent_bc_dlq 1110171 non-null float64
55 inv_mths_since_recent_inq 1110171 non-null float64
56 inv_mths_since_recent_revol_delinq 1110171 non-null float64
57 num_accts_ever_120_pd 1110171 non-null float64
58 num_actv_bc_tl 1110171 non-null float64
59 num_actv_rev_tl 1110171 non-null float64
60 num_bc_sats 1110171 non-null float64
61 num_bc_tl 1110171 non-null float64
62 num_il_tl 1110171 non-null float64
63 num_op_rev_tl 1110171 non-null float64
64 num_rev_accts 1110171 non-null float64
65 num_rev_tl_bal_gt_0 1110171 non-null float64
66 num_sats 1110171 non-null float64
67 num_tl_120dpd_2m 1110171 non-null float64
68 num_tl_30dpd 1110171 non-null float64
69 num_tl_90g_dpd_24m 1110171 non-null float64
70 num_tl_op_past_12m 1110171 non-null float64
71 pct_tl_nvr_dlq 1110171 non-null float64
72 percent_bc_gt_75 1110171 non-null float64
73 pub_rec_bankruptcies 1110171 non-null float64
74 tax_liens 1110171 non-null float64
75 tot_hi_cred_lim 1110171 non-null float64
76 total_bal_ex_mort 1110171 non-null float64
77 total_bc_limit 1110171 non-null float64
78 total_il_high_credit_limit 1110171 non-null float64
79 fraction_recovered 1110171 non-null float64
dtypes: float64(74), int64(1), object(5)
memory usage: 686.1+ MB
Before I drop a bunch of rows with nulls from loans_2, I’m concerned about il_util, as it’s missing values in about 50,000 more rows than the rest of the new metric columns. Why would that be?
在我從loans_2刪除一堆行中包含空值的行loans_2 ,我擔心il_util ,因為它比新指標列中的剩余行缺少50,000多行值。 為什么會這樣呢?
count 408722.000000mean 71.832894
std 22.311439
min 0.000000
25% 59.000000
50% 75.000000
75% 87.000000
max 464.000000
Name: il_util, dtype: float64
Peeking back up to the data dictionary, il_util is the “ratio of total current balance to high credit/credit limit on all install acct”. The relevant balance (total_bal_il) and credit limit (total_il_high_credit_limit) metrics appear to already be in the data, so perhaps this utilization metric doesn’t contain any new information. I’ll compare il_util (where it’s present) to the ratio of the other two variables.
回顧數據字典, il_util是“當前總余額與所有安裝帳戶上的最高信用/信用額度之比”。 相關的余額( total_bal_il )和信用額度( total_il_high_credit_limit )度量似乎已經存在于數據中,因此該利用率度量可能不包含任何新信息。 我將把il_util (如果有的話)與其他兩個變量的比率進行比較。
408722 rows × 2 columnscount 408722unique 2
top True
freq 307589
dtype: objectcount 101133.000000
mean 14.638684
std 16.409913
min 1.000000
25% 3.000000
50% 10.000000
75% 21.000000
max 1108.000000
Name: compute_diff, dtype: float64
That’s weird. il_util is equal to the computed ratio three-quarters of the time, but when it’s off, the median difference is 10 points off. Perhaps there’s new information there sometimes after all. Maybe whatever credit bureau is reporting the utilization rate uses a different formula than just a simple ratio? Again, something I could ask if I were performing this analysis for a client, but that’s not the case. I’ll assume that this variable is still valuable, and where il_util is null I’ll impute the value to make it equal to the ratio of total_bal_il to total_il_high_credit_limit (or 0 if the limit is 0). And I’ll add one more boolean field to mark the imputed entries.
那真是怪了。 il_util等于四分之三的時間所計算的比率,但是當它關??閉時,中位數差異減少了10點。 也許有時畢竟那里有新信息。 也許任何征信機構都在報告利用率不只是簡單比率而是使用不同的公式? 再次,我可能會問我是否正在為客戶執行此分析,但是事實并非如此。 我假設該變量仍然有價值,并且在il_util為null的情況下,我將il_util該值使其等于total_bal_il與total_il_high_credit_limit的比率(如果限制為0,則為0)。 我將再添加一個布爾字段來標記估算的條目。
Also, that 1,108 is a doozy of an outlier, but I think I’ll just leave it be, as it appears that outliers aren’t too big a deal if the neural network architecture is sufficiently deep.
另外,這1,108個數字是一個離群值的雜項,但我想我會保留它,因為如果神經網絡架構足夠深, 離群值似乎并不太重要 。
<class 'pandas.core.frame.DataFrame'>Int64Index: 1110171 entries, 0 to 2260697
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 1110171 non-null float64
1 term 1110171 non-null object
2 emp_length 1110171 non-null object
3 home_ownership 1110171 non-null object
4 annual_inc 1110171 non-null float64
5 purpose 1110171 non-null object
6 dti 1110171 non-null float64
7 delinq_2yrs 1110171 non-null float64
8 cr_hist_age_mths 1110171 non-null int64
9 fico_range_low 1110171 non-null float64
10 fico_range_high 1110171 non-null float64
11 inq_last_6mths 1110171 non-null float64
12 inv_mths_since_last_delinq 1110171 non-null float64
13 inv_mths_since_last_record 1110171 non-null float64
14 open_acc 1110171 non-null float64
15 pub_rec 1110171 non-null float64
16 revol_bal 1110171 non-null float64
17 revol_util 1110171 non-null float64
18 total_acc 1110171 non-null float64
19 collections_12_mths_ex_med 1110171 non-null float64
20 inv_mths_since_last_major_derog 1110171 non-null float64
21 application_type 1110171 non-null object
22 annual_inc_joint 1110171 non-null float64
23 dti_joint 1110171 non-null float64
24 acc_now_delinq 1110171 non-null float64
25 tot_coll_amt 1110171 non-null float64
26 tot_cur_bal 1110171 non-null float64
27 open_acc_6m 459541 non-null float64
28 open_act_il 459541 non-null float64
29 open_il_12m 459541 non-null float64
30 open_il_24m 459541 non-null float64
31 inv_mths_since_rcnt_il 1110171 non-null float64
32 total_bal_il 459541 non-null float64
33 il_util 459541 non-null float64
34 open_rv_12m 459541 non-null float64
35 open_rv_24m 459541 non-null float64
36 max_bal_bc 459541 non-null float64
37 all_util 459541 non-null float64
38 total_rev_hi_lim 1110171 non-null float64
39 inq_fi 459541 non-null float64
40 total_cu_tl 459541 non-null float64
41 inq_last_12m 459541 non-null float64
42 acc_open_past_24mths 1110171 non-null float64
43 avg_cur_bal 1110171 non-null float64
44 bc_open_to_buy 1110171 non-null float64
45 bc_util 1110171 non-null float64
46 chargeoff_within_12_mths 1110171 non-null float64
47 delinq_amnt 1110171 non-null float64
48 mo_sin_old_il_acct 1110171 non-null float64
49 mo_sin_old_rev_tl_op 1110171 non-null float64
50 inv_mo_sin_rcnt_rev_tl_op 1110171 non-null float64
51 inv_mo_sin_rcnt_tl 1110171 non-null float64
52 mort_acc 1110171 non-null float64
53 inv_mths_since_recent_bc 1110171 non-null float64
54 inv_mths_since_recent_bc_dlq 1110171 non-null float64
55 inv_mths_since_recent_inq 1110171 non-null float64
56 inv_mths_since_recent_revol_delinq 1110171 non-null float64
57 num_accts_ever_120_pd 1110171 non-null float64
58 num_actv_bc_tl 1110171 non-null float64
59 num_actv_rev_tl 1110171 non-null float64
60 num_bc_sats 1110171 non-null float64
61 num_bc_tl 1110171 non-null float64
62 num_il_tl 1110171 non-null float64
63 num_op_rev_tl 1110171 non-null float64
64 num_rev_accts 1110171 non-null float64
65 num_rev_tl_bal_gt_0 1110171 non-null float64
66 num_sats 1110171 non-null float64
67 num_tl_120dpd_2m 1110171 non-null float64
68 num_tl_30dpd 1110171 non-null float64
69 num_tl_90g_dpd_24m 1110171 non-null float64
70 num_tl_op_past_12m 1110171 non-null float64
71 pct_tl_nvr_dlq 1110171 non-null float64
72 percent_bc_gt_75 1110171 non-null float64
73 pub_rec_bankruptcies 1110171 non-null float64
74 tax_liens 1110171 non-null float64
75 tot_hi_cred_lim 1110171 non-null float64
76 total_bal_ex_mort 1110171 non-null float64
77 total_bc_limit 1110171 non-null float64
78 total_il_high_credit_limit 1110171 non-null float64
79 fraction_recovered 1110171 non-null float64
80 il_util_imputed 1110171 non-null bool
dtypes: bool(1), float64(74), int64(1), object(5)
memory usage: 687.1+ MB
Good. Ready to drop rows with nulls in loans_2 and move on to the DataFrame for the model that adds the new metrics for joint applications.
好。 準備刪除loans_2具有空值的行,然后轉到該模型的DataFrame,該模型為聯合應用程序添加了新指標。
<class 'pandas.core.frame.DataFrame'>Int64Index: 14453 entries, 421222 to 2157147
Data columns (total 94 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 14453 non-null float64
1 term 14453 non-null object
2 emp_length 14453 non-null object
3 home_ownership 14453 non-null object
4 annual_inc 14453 non-null float64
5 purpose 14453 non-null object
6 dti 14453 non-null float64
7 delinq_2yrs 14453 non-null float64
8 cr_hist_age_mths 14453 non-null int64
9 fico_range_low 14453 non-null float64
10 fico_range_high 14453 non-null float64
11 inq_last_6mths 14453 non-null float64
12 inv_mths_since_last_delinq 14453 non-null float64
13 inv_mths_since_last_record 14453 non-null float64
14 open_acc 14453 non-null float64
15 pub_rec 14453 non-null float64
16 revol_bal 14453 non-null float64
17 revol_util 14453 non-null float64
18 total_acc 14453 non-null float64
19 collections_12_mths_ex_med 14453 non-null float64
20 inv_mths_since_last_major_derog 14453 non-null float64
21 application_type 14453 non-null object
22 annual_inc_joint 14453 non-null float64
23 dti_joint 14453 non-null float64
24 acc_now_delinq 14453 non-null float64
25 tot_coll_amt 14453 non-null float64
26 tot_cur_bal 14453 non-null float64
27 open_acc_6m 14453 non-null float64
28 open_act_il 14453 non-null float64
29 open_il_12m 14453 non-null float64
30 open_il_24m 14453 non-null float64
31 inv_mths_since_rcnt_il 14453 non-null float64
32 total_bal_il 14453 non-null float64
33 il_util 14453 non-null float64
34 open_rv_12m 14453 non-null float64
35 open_rv_24m 14453 non-null float64
36 max_bal_bc 14453 non-null float64
37 all_util 14453 non-null float64
38 total_rev_hi_lim 14453 non-null float64
39 inq_fi 14453 non-null float64
40 total_cu_tl 14453 non-null float64
41 inq_last_12m 14453 non-null float64
42 acc_open_past_24mths 14453 non-null float64
43 avg_cur_bal 14453 non-null float64
44 bc_open_to_buy 14453 non-null float64
45 bc_util 14453 non-null float64
46 chargeoff_within_12_mths 14453 non-null float64
47 delinq_amnt 14453 non-null float64
48 mo_sin_old_il_acct 14453 non-null float64
49 mo_sin_old_rev_tl_op 14453 non-null float64
50 inv_mo_sin_rcnt_rev_tl_op 14453 non-null float64
51 inv_mo_sin_rcnt_tl 14453 non-null float64
52 mort_acc 14453 non-null float64
53 inv_mths_since_recent_bc 14453 non-null float64
54 inv_mths_since_recent_bc_dlq 14453 non-null float64
55 inv_mths_since_recent_inq 14453 non-null float64
56 inv_mths_since_recent_revol_delinq 14453 non-null float64
57 num_accts_ever_120_pd 14453 non-null float64
58 num_actv_bc_tl 14453 non-null float64
59 num_actv_rev_tl 14453 non-null float64
60 num_bc_sats 14453 non-null float64
61 num_bc_tl 14453 non-null float64
62 num_il_tl 14453 non-null float64
63 num_op_rev_tl 14453 non-null float64
64 num_rev_accts 14453 non-null float64
65 num_rev_tl_bal_gt_0 14453 non-null float64
66 num_sats 14453 non-null float64
67 num_tl_120dpd_2m 14453 non-null float64
68 num_tl_30dpd 14453 non-null float64
69 num_tl_90g_dpd_24m 14453 non-null float64
70 num_tl_op_past_12m 14453 non-null float64
71 pct_tl_nvr_dlq 14453 non-null float64
72 percent_bc_gt_75 14453 non-null float64
73 pub_rec_bankruptcies 14453 non-null float64
74 tax_liens 14453 non-null float64
75 tot_hi_cred_lim 14453 non-null float64
76 total_bal_ex_mort 14453 non-null float64
77 total_bc_limit 14453 non-null float64
78 total_il_high_credit_limit 14453 non-null float64
79 revol_bal_joint 14453 non-null float64
80 sec_app_fico_range_low 14453 non-null float64
81 sec_app_fico_range_high 14453 non-null float64
82 sec_app_cr_hist_age_mths 14453 non-null Int64
83 sec_app_inq_last_6mths 14453 non-null float64
84 sec_app_mort_acc 14453 non-null float64
85 sec_app_open_acc 14453 non-null float64
86 sec_app_revol_util 14453 non-null float64
87 sec_app_open_act_il 14453 non-null float64
88 sec_app_num_rev_accts 14453 non-null float64
89 sec_app_chargeoff_within_12_mths 14453 non-null float64
90 sec_app_collections_12_mths_ex_med 14453 non-null float64
91 sec_app_inv_mths_since_last_major_derog 14453 non-null float64
92 fraction_recovered 14453 non-null float64
93 il_util_imputed 14453 non-null bool
dtypes: Int64(1), bool(1), float64(86), int64(1), object(5)
memory usage: 10.4+ MB
Phew, the data’s all clean now! Time for the fun part.
ew,數據現在全部干凈了! 時間是有趣的部分。
建立神經網絡 (Building the neural networks)
After a good deal of trial and error, I found that a network architecture with three hidden layers, each followed by a dropout layer of rate 0.3, was as good as I could find. I used ReLU activation in those hidden layers, and adam optimization and a loss metric of mean squared error in the model as a whole. I tried using mean absolute error at first, but then I found that the resulting model would essentially always guess either 1 or 0 for the output, and the majority of the dataset’s output is 1. Therefore, larger errors needed to be penalized to a greater degree, which is what mean squared error is good at.
經過大量的反復試驗,我發現一個網絡體系結構具有三個隱藏層,每個層次后面緊跟著一個速率為0.3的退出層,這是我所能找到的。 我在那些隱藏層中使用了ReLU激活,在整個模型中使用了adam優化和均方誤差的損失度量。 我最初嘗試使用平均絕對誤差,但隨后發現結果模型實際上總是會為輸出猜測1或0,并且數據集的大部分輸出為1。因此,較大的誤差需要受到較大的懲罰。度,即平方誤差最擅長的。
The dataset being so large, I had great results increasing the batch size for the first couple models.
數據集是如此之大,對于增加前幾個模型的批處理量,我取得了很好的結果。
Model 1:Epoch 1/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0738 - val_loss: 0.0601
Epoch 2/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0600 - val_loss: 0.0597
Epoch 3/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0595 - val_loss: 0.0592
Epoch 4/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0594 - val_loss: 0.0589
Epoch 5/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0593 - val_loss: 0.0597
Epoch 6/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0593 - val_loss: 0.0591
Epoch 7/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0592 - val_loss: 0.0591
Epoch 8/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0591 - val_loss: 0.0597
Epoch 9/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0591 - val_loss: 0.0588
Epoch 10/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0591 - val_loss: 0.0589
Epoch 11/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0591 - val_loss: 0.0585
Epoch 12/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0590 - val_loss: 0.0586
Epoch 13/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0587
Epoch 14/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0591
Epoch 15/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0588
Epoch 16/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0589
Epoch 17/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0584
Epoch 18/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0591
Epoch 19/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0586
Epoch 20/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 21/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0585
Epoch 22/100
6939/6939 [==============================] - 16s 2ms/step - loss: 0.0589 - val_loss: 0.0583
Epoch 23/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 24/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 25/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 26/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 27/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 28/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0582
Epoch 29/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0590
Epoch 30/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 31/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 32/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0584
Epoch 33/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 34/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 35/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 36/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 37/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 38/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 39/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0586
Epoch 40/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 41/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 42/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 43/100
6939/6939 [==============================] - 16s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 44/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 45/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0593
Epoch 46/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 47/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0585
Epoch 48/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 49/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0586
Epoch 50/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 51/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 52/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 53/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0586
Epoch 54/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0593
Epoch 55/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 56/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 57/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 58/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 59/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 60/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0584
Epoch 61/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 62/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 63/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 64/100
6939/6939 [==============================] - 16s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 65/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 66/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 67/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0596
Epoch 68/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 69/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 70/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 71/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 72/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 73/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 74/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 75/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 76/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 77/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 78/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 79/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 80/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 81/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 82/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 83/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0592
Epoch 84/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 85/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0592
Epoch 86/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 87/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 88/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0594
Epoch 89/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 90/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 91/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 92/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 93/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0585
Epoch 94/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0594
Epoch 95/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 96/100
6939/6939 [==============================] - 19s 3ms/step - loss: 0.0588 - val_loss: 0.0593
Epoch 97/100
6939/6939 [==============================] - 21s 3ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 98/100
6939/6939 [==============================] - 20s 3ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 99/100
6939/6939 [==============================] - 19s 3ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 100/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0590
Model 2:
Epoch 1/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.1028 - val_loss: 0.0762
Epoch 2/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0757 - val_loss: 0.0740
Epoch 3/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0748 - val_loss: 0.0730
Epoch 4/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0743 - val_loss: 0.0734
Epoch 5/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0741 - val_loss: 0.0733
Epoch 6/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0740 - val_loss: 0.0730
Epoch 7/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0739 - val_loss: 0.0729
Epoch 8/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0738 - val_loss: 0.0732
Epoch 9/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0737 - val_loss: 0.0727
Epoch 10/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0736 - val_loss: 0.0733
Epoch 11/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0736 - val_loss: 0.0725
Epoch 12/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0736 - val_loss: 0.0726
Epoch 13/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0725
Epoch 14/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0735 - val_loss: 0.0726
Epoch 15/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0732
Epoch 16/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0726
Epoch 17/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0726
Epoch 18/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.0734 - val_loss: 0.0726
Epoch 19/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0733 - val_loss: 0.0732
Epoch 20/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0733 - val_loss: 0.0730
Epoch 21/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0733 - val_loss: 0.0725
Epoch 22/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.0732 - val_loss: 0.0726
Epoch 23/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0731 - val_loss: 0.0726
Epoch 24/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0732 - val_loss: 0.0725
Epoch 25/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.0731 - val_loss: 0.0727
Epoch 26/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0732 - val_loss: 0.0730
Epoch 27/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0732 - val_loss: 0.0725
Epoch 28/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0731 - val_loss: 0.0724
Epoch 29/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0731 - val_loss: 0.0731
Epoch 30/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0731 - val_loss: 0.0725
Epoch 31/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0731 - val_loss: 0.0727
Epoch 32/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 33/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 34/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0727
Epoch 35/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0729
Epoch 36/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0723
Epoch 37/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 38/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0729
Epoch 39/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 40/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0723
Epoch 41/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0722
Epoch 42/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0729 - val_loss: 0.0723
Epoch 43/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 44/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0731 - val_loss: 0.0725
Epoch 45/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 46/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0730
Epoch 47/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0727
Epoch 48/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 49/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0727
Epoch 50/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 51/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 52/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 53/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0729 - val_loss: 0.0730
Epoch 54/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 55/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 56/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 57/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 58/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 59/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 60/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 61/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0728
Epoch 62/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0726
Epoch 63/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0725
Epoch 64/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0724
Epoch 65/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0724
Epoch 66/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0730
Epoch 67/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 68/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0724
Epoch 69/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0727
Epoch 70/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0734
Epoch 71/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0729
Epoch 72/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0727
Epoch 73/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 74/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 75/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 76/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0726
Epoch 77/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 78/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0726
Epoch 79/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 80/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 81/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0728
Epoch 82/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0726
Epoch 83/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0727
Epoch 84/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0729
Epoch 85/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0728
Epoch 86/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0727
Epoch 87/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0730
Epoch 88/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0726 - val_loss: 0.0727
Epoch 89/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 90/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0726
Epoch 91/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 92/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0728
Epoch 93/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0729
Epoch 94/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0729
Epoch 95/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0727
Epoch 96/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0728
Epoch 97/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 98/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0732
Epoch 99/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 100/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0729
Model 3:
Epoch 1/100
362/362 [==============================] - 1s 2ms/step - loss: 0.3603 - val_loss: 0.2006
Epoch 2/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1843 - val_loss: 0.1489
Epoch 3/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1386 - val_loss: 0.1311
Epoch 4/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1239 - val_loss: 0.1226
Epoch 5/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1173 - val_loss: 0.1181
Epoch 6/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1144 - val_loss: 0.1170
Epoch 7/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1132 - val_loss: 0.1163
Epoch 8/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1112 - val_loss: 0.1164
Epoch 9/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1105 - val_loss: 0.1139
Epoch 10/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1088 - val_loss: 0.1120
Epoch 11/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1087 - val_loss: 0.1118
Epoch 12/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1070 - val_loss: 0.1114
Epoch 13/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1059 - val_loss: 0.1116
Epoch 14/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1043 - val_loss: 0.1111
Epoch 15/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1036 - val_loss: 0.1103
Epoch 16/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1030 - val_loss: 0.1102
Epoch 17/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1024 - val_loss: 0.1098
Epoch 18/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1018 - val_loss: 0.1095
Epoch 19/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1014 - val_loss: 0.1086
Epoch 20/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1005 - val_loss: 0.1086
Epoch 21/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0997 - val_loss: 0.1095
Epoch 22/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0993 - val_loss: 0.1092
Epoch 23/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0986 - val_loss: 0.1090
Epoch 24/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0983 - val_loss: 0.1096
Epoch 25/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0975 - val_loss: 0.1099
Epoch 26/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0964 - val_loss: 0.1092
Epoch 27/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0968 - val_loss: 0.1092
Epoch 28/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0960 - val_loss: 0.1093
Epoch 29/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0954 - val_loss: 0.1100
Epoch 30/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0952 - val_loss: 0.1096
Epoch 31/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0946 - val_loss: 0.1105
Epoch 32/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0942 - val_loss: 0.1109
Epoch 33/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0930 - val_loss: 0.1103
Epoch 34/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0917 - val_loss: 0.1103
Epoch 35/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0908 - val_loss: 0.1112
Epoch 36/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0922 - val_loss: 0.1107
Epoch 37/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0918 - val_loss: 0.1117
Epoch 38/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0910 - val_loss: 0.1111
Epoch 39/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0910 - val_loss: 0.1118
Epoch 40/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0897 - val_loss: 0.1126
Epoch 41/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0884 - val_loss: 0.1128
Epoch 42/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0890 - val_loss: 0.1121
Epoch 43/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0893 - val_loss: 0.1118
Epoch 44/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0877 - val_loss: 0.1122
Epoch 45/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0874 - val_loss: 0.1121
Epoch 46/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0864 - val_loss: 0.1119
Epoch 47/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0873 - val_loss: 0.1128
Epoch 48/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0858 - val_loss: 0.1126
Epoch 49/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0872 - val_loss: 0.1128
Epoch 50/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0852 - val_loss: 0.1133
Epoch 51/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0857 - val_loss: 0.1137
Epoch 52/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0848 - val_loss: 0.1142
Epoch 53/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0842 - val_loss: 0.1134
Epoch 54/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0839 - val_loss: 0.1120
Epoch 55/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0820 - val_loss: 0.1153
Epoch 56/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0831 - val_loss: 0.1139
Epoch 57/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0821 - val_loss: 0.1151
Epoch 58/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0829 - val_loss: 0.1147
Epoch 59/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0821 - val_loss: 0.1133
Epoch 60/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0820 - val_loss: 0.1148
Epoch 61/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0809 - val_loss: 0.1162
Epoch 62/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0808 - val_loss: 0.1151
Epoch 63/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0795 - val_loss: 0.1149
Epoch 64/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0802 - val_loss: 0.1159
Epoch 65/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0797 - val_loss: 0.1153
Epoch 66/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0791 - val_loss: 0.1158
Epoch 67/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0789 - val_loss: 0.1172
Epoch 68/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0804 - val_loss: 0.1152
Epoch 69/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0790 - val_loss: 0.1165
Epoch 70/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0788 - val_loss: 0.1167
Epoch 71/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0781 - val_loss: 0.1174
Epoch 72/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0772 - val_loss: 0.1186
Epoch 73/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0785 - val_loss: 0.1163
Epoch 74/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0778 - val_loss: 0.1163
Epoch 75/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0767 - val_loss: 0.1189
Epoch 76/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0774 - val_loss: 0.1189
Epoch 77/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0769 - val_loss: 0.1177
Epoch 78/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0759 - val_loss: 0.1187
Epoch 79/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0755 - val_loss: 0.1203
Epoch 80/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0761 - val_loss: 0.1188
Epoch 81/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0743 - val_loss: 0.1203
Epoch 82/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0753 - val_loss: 0.1177
Epoch 83/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0760 - val_loss: 0.1199
Epoch 84/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0746 - val_loss: 0.1191
Epoch 85/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0756 - val_loss: 0.1193
Epoch 86/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0743 - val_loss: 0.1206
Epoch 87/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0732 - val_loss: 0.1209
Epoch 88/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0746 - val_loss: 0.1213
Epoch 89/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0725 - val_loss: 0.1223
Epoch 90/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0738 - val_loss: 0.1196
Epoch 91/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0725 - val_loss: 0.1241
Epoch 92/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0744 - val_loss: 0.1226
Epoch 93/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0727 - val_loss: 0.1213
Epoch 94/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0718 - val_loss: 0.1218
Epoch 95/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0746 - val_loss: 0.1217
Epoch 96/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0733 - val_loss: 0.1227
Epoch 97/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0698 - val_loss: 0.1250
Epoch 98/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0731 - val_loss: 0.1225
Epoch 99/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0728 - val_loss: 0.1226
Epoch 100/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0718 - val_loss: 0.1231
The first model performed best, settling around a mean squared error of 0.0588 (though it seems even after setting random_state inside train_test_split and seed inside the dropout layers, there’s still a bit of entropy left in the training of the model, so if you run this notebook yourself, the course of your training may look a little different). Apparently the additional records in the first dataset did more to aid in training than the additional metrics in the subsequent sets. And the dropout layers didn’t stop the third model from overfitting anyway.
第一個模型表現最佳,解決圍繞0.0588均方誤差(雖然它似乎甚至設置后random_state內train_test_split和seed漏失層內,仍然有位熵留在模型的訓練,所以,如果你運行筆記本,培訓的過程可能會有些不同)。 顯然,與后續集合中的其他指標相比,第一個數據集中的其他記錄在培訓方面的作用更大。 而且,輟學層并沒有阻止第三種模型過度擬合。
保存最終模型 (Saving the final model)
First I need to create the final model, training model_1’s architecture on the full dataset. Then I’ll save the model to disk with its save function and save the data transformer using joblib so I can use it in the API.
首先,我需要創建最終模型,在完整數據集上訓練model_1的體系結構。 然后,我將使用其save功能將模型保存到磁盤,并使用joblib保存數據轉換器,以便可以在API中使用它。
Epoch 1/1008674/8674 [==============================] - 16s 2ms/step - loss: 0.0750
Epoch 2/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0597
Epoch 3/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0593
Epoch 4/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0592
Epoch 5/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0591
Epoch 6/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0590
Epoch 7/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0590
Epoch 8/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0590
Epoch 9/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 10/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 11/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0589
Epoch 12/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 13/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 14/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 15/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 16/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 17/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 18/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 19/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0588
Epoch 20/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 21/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 22/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 23/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 24/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 25/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 26/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 27/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 28/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 29/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 30/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 31/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 32/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 33/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 34/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 35/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 36/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 37/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 38/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 39/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 40/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 41/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 42/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 43/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 44/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 45/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 46/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 47/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 48/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 49/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 50/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 51/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 52/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 53/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 54/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 55/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 56/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 57/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 58/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 59/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 60/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 61/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0586
Epoch 62/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 63/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0586
Epoch 64/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 65/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 66/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 67/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 68/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 69/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 70/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 71/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 72/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 73/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 74/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 75/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 76/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 77/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 78/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 79/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 80/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 81/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 82/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 83/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 84/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 85/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 86/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 87/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 88/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 89/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 90/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 91/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 92/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 93/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 94/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 95/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 96/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 97/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 98/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 99/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0586
Epoch 100/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587['data_transformer.joblib']
構建API (Building the API)
I first tried building this API and its demonstrational front end on Glitch, which, officially, only supports Node.js back ends, but unofficially you can get a Python server running there (which I’ve done before using Flask). When I was almost finished, though, I tried importing TensorFlow to load my model, and it was then that I discovered that unlike Node.js dependencies, Python dependencies get installed to your project’s disk space on Glitch, and not even their pro plan provides enough space to contain the entire TensorFlow library. Which totally makes sense — I certainly wasn’t using the platform as intended.
我首先嘗試在Glitch上構建此API及其示例性前端,該Glitch正式僅支持Node.js后端,但是在非正式的情況下,您可以在那里運行Python服務器( 在使用Flask 之前 ,我已經完成了此工作 )。 不過,當我快要結束時,我嘗試導入TensorFlow來加載我的模型,然后我發現與Node.js依賴項不同,Python依賴項已安裝到項目在Glitch上的磁盤空間中,甚至他們的專業計劃都沒有提供有足夠的空間來容納整個TensorFlow庫。 完全有道理-我當然沒有按預期使用平臺。
Then I discovered PythonAnywhere! They have plenty of common Python libraries already installed out-of-the-box, including TensorFlow, so I got everything working perfectly there.
然后我發現了PythonAnywhere ! 他們已經開箱即用地安裝了許多常見的Python庫,包括TensorFlow,所以我在那里一切都能正常工作。
So head on over if you’d like to check it out; the front end includes a form where you can fill in all the parameters for the API request, and there are a couple of buttons that let you fill the form with typical examples from the dataset (since there are a lot of fields to fill in). Or you can send a GET request to https://tywmick.pythonanywhere.com/api/predict if you really want to include every parameter in your query string. In either case, you’re also more than welcome to take a look at its source on GitHub.
因此,如果您想查看一下,請直接過去; 前端包含一個表格,您可以在其中填寫API請求的所有參數,并且有幾個按鈕可以讓您使用數據集中的典型示例來填寫表格(因為有很多字段需要填寫) 。 或者,如果您確實要在查詢字符串中包含每個參數,則可以將GET請求發送到https://tywmick.pythonanywhere.com/api/predict 。 無論哪種情況,都非常歡迎您在GitHub上查看其源代碼。
One of the best/worst things about machine learning is that your models always have room for improvement. I mentioned a couple of ideas along the way above for how I could improve the model in the future, but what’s the first thing you would tweak? Leave a response—I’d love to hear!
關于機器學習的最好/最糟糕的事情之一就是您的模型總是有改進的空間。 在上面的過程中,我提到了一些想法,以便將來我可以改進模型,但是您要調整的第一件事是什么? 留下回應-我很想聽聽!
翻譯自: https://towardsdatascience.com/loan-risk-neural-network-30c8f65f052e
總結
以上是生活随笔為你收集整理的建立神经网络来预测贷款风险的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 特征选择 回归_如何执行回归问题的特征选
- 下一篇: ChatGPT 又赢了:带动股价涨三倍,