WSDM-爱奇艺:用户留存预测挑战赛 线上0.865
賽題介紹
http://challenge.ai.iqiyi.com/detail?raceId=61600f6cef1b65639cd5eaa6
https://www.datafountain.cn/competitions/551
運行說明【非常重要】
賽題描述
愛奇藝是中國和世界領先的高品質視頻娛樂流媒體平臺,每個月有超過5億的用戶在愛奇藝上享受娛樂服務。愛奇藝秉承“悅享品質”的品牌口號,打造涵蓋影劇、綜藝、動漫在內的專業正版視頻內容庫,和“隨刻”等海量的用戶原創內容,為用戶提供豐富的專業視頻體驗。
愛奇藝手機端APP,通過深度學習等最新的AI技術,提升用戶個性化的產品體驗,更好地讓用戶享受定制化的娛樂服務。我們用“N日留存分”這一關鍵指標來衡量用戶的滿意程度。例如,如果一個用戶10月1日的“7日留存分”等于3,代表這個用戶接下來的7天里(10月2日~8日),有3天會訪問愛奇藝APP。預測用戶的留存分是個充滿挑戰的難題:不同用戶本身的偏好、活躍度差異很大,另外用戶可支配的娛樂時間、熱門內容的流行趨勢等其他因素,也有很強的周期性特征。
本次大賽基于愛奇藝APP脫敏和采樣后的數據信息,預測用戶的7日留存分。參賽隊伍需要設計相應的算法進行數據分析和預測。
數據描述
本次比賽提供了豐富的數據集,包含視頻數據、用戶畫像數據、用戶啟動日志、用戶觀影和互動行為日志等。針對測試集用戶,需要預測每一位用戶某一日的“7日留存分”。7日留存分取值范圍從0到7,預測結果保留小數點后2位。
評價指標
本次比賽是一個數值預測類問題。評價函數使用: 100 ? ( 1 ? 1 n ∑ i = 1 n ∣ F i ? A i 7 ∣ ) 100-(1-\frac{1}{n}\sum_{i=1}^{n}{|\frac{F_i-A_i}{7}|}) 100?(1?n1?∑i=1n?∣7Fi??Ai??∣)。
n n n是測試集用戶數量, F F F是參賽者對用戶的7日留存分預測值, A A A是真實的7日留存分真實值。
評審說明
選手的提交應為UTF-8編碼的csv文件。文件的格式和順序需要和測試集保持一致。參見競賽數據集下載部分“sample-a”。所有預測數據保留小數點后2位有效數字。不符合提交格式的文件被視為無效,并浪費一次提交機會。
本次比賽分為A、B 2個階段。2個階段的訓練集是一樣的,但需要選手預測的測試集不同。
- A階段截止2022.01.17。A階段測試集包含15001個需要預測的用戶,用于A階段比賽和排行榜。每個用戶提供用戶id和end_date日期。選手需要預測這個用戶,對應[end_date+1 ~ end_date+7],這未來7天里的7日留存分。
- B階段從2022.01.17開始,截止2022.01.20。屆時系統會重新提供B階段測試集。B階段測試集更大,包含35000個需要預測的用戶。B階段使用單獨的排行榜,其余細節和A階段一致。
最后比賽結果以B階段成績為準,同時選手需要提交輔助性材料,證明其成績合法有效。
特別說明
- 愛奇藝AI競賽平臺作為大賽官網,是挑戰賽主戰場。若參與主賽場比賽,選手需登錄大賽官網完成注冊報名,并務必在大賽官網主賽場提交預測結果。
- 每支參賽隊伍的隊伍人數最多5人。
- DataFountain競賽平臺作為2022WSDM用戶留存預測挑戰賽的練習場,在A榜階段為參賽選手提供每天額外2次的成績測試提交機會,助力大家在大賽官網主賽場中取得優異成績。
- A榜階段,DataFountain競賽平臺和大賽官網主賽場均可提交預測結果;B榜階段,請參賽選手前往大賽官網主賽場提交預測結果。該賽題最終排名榜單以大賽官網主賽場發布的結果為準。
數據集解釋
1. User portrait data
| user_id | |
| device_type | iOS, Android |
| device_rom | rom of the device |
| device_ram | ram of the device |
| sex | |
| age | |
| education | |
| occupation_status | |
| territory_code |
2. App launch logs
| user_id | |
| date | Desensitization, started from 0 |
| launch_type | spontaneous or launched by other apps & deep-links |
3. Video related data
| item_id | id of the video |
| father_id | album id, if the video is an episode of an album collection |
| cast | a list of actors/actresses |
| duration | video length |
| tag_list | a list of tags |
4. User playback data
| user_id | |
| item_id | |
| playtime | video playback time |
| date | timestamp of the behavior |
5. User interaction data
| user_id | |
| item_id | |
| interact_type | interaction types such as posting comments, etc. |
| date | timestamp of the behavior |
時間線
- 2021.10.15:賽事啟動,賽題正式發布,開放賽題數據集,開放組隊報名。
- 2021.11.15:開放公開排名榜,參賽者可以提交預測結果。2021.12.20: 報名截止
- 2022.01.17: A階段停止提交結果,B階段測試集、排行榜開放。
- 2022.01.20: B階段停止提交結果
- 2022.01.21: B階段TOP5團隊解釋文檔停止提交(提交方式稍后公布)
- 2022.01.25: 公布最終成績
- 2022.02.17: Top 3隊伍報告會及獎項頒發
獎項設置
- 冠軍隊伍: 一支 ($2000)
- 亞軍隊伍: 一支 ($800)
- 季軍隊伍: 一支 ($500)
基礎字段分析
user_portrait
| user_id | |
| device_type | iOS, Android |
| device_rom | rom of the device |
| device_ram | ram of the device |
| sex | |
| age | |
| education | |
| occupation_status | |
| territory_code |
| 10209854 | 2.0 | 5731 | 109581 | 1.0 | 2.0 | 0.0 | 1.0 | 865101.0 |
| 10230057 | 2.0 | 1877 | 20888 | 1.0 | 4.0 | 0.0 | 1.0 | 864102.0 |
| 10268855 | 2.0 | NaN | NaN | 1.0 | 3.0 | NaN | NaN | NaN |
| 10268855 | 2.0 | NaN | NaN | 1.0 | 3.0 | NaN | NaN | NaN |
有一個用戶記錄存在重復,考慮剔除。
user_portrait = user_portrait.drop_duplicates()device_type
device_type 為類別類型,根據手機系統占比,猜測2為安卓,1為ios,3為wp,4為未知或其他
user_portrait['device_type'].value_counts() 2.0 480055 1.0 85322 3.0 28909 4.0 2280 Name: device_type, dtype: int64ram 和 rom
在手機上,ROM用來存放數據,如系統程序,應用程序,音頻,視頻和文檔的,由于視頻等存儲空間大,所以ROM比RAM大很多,現在主流手機都是8G的空間
RAM又叫運行內存,存放臨時程序的,速度要遠大于ROM,現在主流手機都是1G的RAM,RAM越大,手機運行越快,玩大型游戲,也就越流暢
# 提取手機信息 user_portrait['device_ram'] = user_portrait['device_ram'].apply(lambda x: str(x).split(';')[0]) user_portrait['device_rom'] = user_portrait['device_rom'].apply(lambda x: str(x).split(';')[0]) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy"""Entry point for launching an IPython kernel. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy sns.distplot(user_portrait['device_ram']) <matplotlib.axes._subplots.AxesSubplot at 0x7f97602fc650>[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-wzt0EGWL-1646533563012)(output_16_1.png)]
sns.distplot(user_portrait['device_rom']) <matplotlib.axes._subplots.AxesSubplot at 0x7f97602597d0>sex
user_portrait['sex'].value_counts() 1.0 308846 2.0 281612 Name: sex, dtype: int64age
sns.distplot(user_portrait['age']) <matplotlib.axes._subplots.AxesSubplot at 0x7f9760069950>education
sns.distplot(user_portrait['education']) <matplotlib.axes._subplots.AxesSubplot at 0x7f0cc1cd5610>occupation_status
sns.distplot(user_portrait['occupation_status']) <matplotlib.axes._subplots.AxesSubplot at 0x7f0c793e0d90>territory_code
用戶常駐地域編號
sns.distplot(user_portrait['territory_code']) <matplotlib.axes._subplots.AxesSubplot at 0x7f0c791543d0>app_launch
| user_id | |
| date | Desensitization, started from 0 |
| launch_type | spontaneous or launched by other apps & deep-links |
| 10157996 | 0 | 129 |
| 10139583 | 0 | 129 |
| 10000000 | 0 | 131 |
| 10000000 | 0 | 132 |
| 10000000 | 0 | 141 |
| 10000000 | 0 | 164 |
| 10000000 | 0 | 179 |
video_related
| item_id | id of the video |
| father_id | album id, if the video is an episode of an album collection |
| cast | a list of actors/actresses |
| duration | video length |
| tag_list | a list of tags |
| 24403453.0 | 6.0 | NaN | 50365080;50338575;50313222;50165986 | NaN |
| 22838795.0 | 7.0 | NaN | 50001708;50323515;50125414 | NaN |
user_playback
user_playback.head()| 10057286 | 20628283.0 | 2208.612 | 145 |
| 10522615 | 23930557.0 | 31.054 | 145 |
| 10494028 | 20173699.0 | 115.952 | 145 |
| 10181987 | 21350426.0 | 1.585 | 145 |
| 10439175 | 22946929.0 | 51.726 | 145 |
user_interaction
| user_id | |
| item_id | |
| interact_type | interaction types such as posting comments, etc. |
| date | timestamp of the behavior |
| 10243056 | 22635954 | 1 | 213 |
| 10203565 | 24723827 | 3 | 213 |
探索性數據分析
- app_launch
- 歷史一天、三天、一周、一個月、三個月的行為
| 10052988 | 0 | 147 |
| 10052988 | 0 | 149 |
| 10007813 | 205 |
| 10052988 | 210 |
| 10279068 | 200 |
| 10546696 | 216 |
| 10406659 | 183 |
| ... | ... |
| 10355586 | 205 |
| 10589773 | 210 |
| 10181954 | 218 |
| 10544736 | 164 |
| 10354569 | 187 |
15001 rows × 2 columns
特征工程
# del user_interaction, user_portrait, user_playback, app_launch, video_related!mkdir wsdm_model_data !python3 baseline_feature_engineering.py mkdir: cannot create directory ‘wsdm_model_data’: File exists構建模型 + 訓練
!unzip data.zip Archive: data.zip replace app_launch_logs.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C import pandas as pd import numpy as np import json import mathdata_dir = "./wsdm_model_data/" # 處理訓練集數據 data = pd.read_csv(data_dir + "train_data.txt", sep="\t") data["launch_seq"] = data.launch_seq.apply(lambda x: json.loads(x)) data["playtime_seq"] = data.playtime_seq.apply(lambda x: json.loads(x)) data["duration_prefer"] = data.duration_prefer.apply(lambda x: json.loads(x)) data["interact_prefer"] = data.interact_prefer.apply(lambda x: json.loads(x)) # shuffle data data = data.sample(frac=1).reset_index(drop=True) data.columns Index(['user_id', 'end_date', 'label', 'launch_seq', 'playtime_seq','duration_prefer', 'father_id_score', 'cast_id_score', 'tag_score','device_type', 'device_ram', 'device_rom', 'sex', 'age', 'education','occupation_status', 'territory_score', 'interact_prefer'],dtype='object') data| 10309777 | 165 | 6 | [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, ... | [0, 0, 0, 0, 0, 0, 0.9414, 0, 0, 0.9998, 0.943... | [0.0, 0.0, 0.0, 0.0, 0.08, 0.0, 0.04, 0.0, 0.0... | 1.209317 | 1.353447 | 0.178947 | 0.194954 | -0.740852 | 1.043355 | -0.955892 | -0.319111 | -0.544818 | 0.746096 | 0.167180 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... |
| 10117035 | 123 | 0 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 0.000000 | 0.000000 | 0.000000 | 0.194954 | -1.195884 | -1.173106 | -0.955892 | -0.319111 | -0.544818 | -1.340308 | 0.000000 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10413843 | 149 | 0 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 0.000000 | 0.000000 | 0.000000 | -2.041925 | -0.637283 | -0.701308 | -0.955892 | -0.319111 | 0.755516 | 0.746096 | -1.106625 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10209341 | 165 | 0 | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0.0475, 0, 0, 0, 0, 0, 0... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 0.000000 | 0.000000 | 0.000000 | 0.194954 | 0.150032 | -0.117076 | -0.955892 | -0.319111 | -0.544818 | 0.746096 | 0.940850 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10430657 | 162 | 0 | [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0492... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 0.000000 | 0.000000 | 0.000000 | 0.194954 | 1.012626 | -0.145958 | 1.046141 | 0.000000 | -0.544818 | 0.000000 | -0.743187 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 10070331 | 122 | 1 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 0.000000 | 0.000000 | 0.000000 | 0.194954 | 0.191747 | 1.228884 | -0.955892 | -0.319111 | -0.544818 | 0.746096 | -0.480041 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10056030 | 115 | 2 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, ... | -0.299726 | 0.000000 | 0.388082 | 0.194954 | -1.195884 | -0.834187 | 1.046141 | 0.828011 | -0.544818 | -1.340308 | -1.524485 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10235314 | 137 | 0 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 1.0, 0.5, 0.0, ... | -0.866054 | 0.000000 | -0.084836 | 0.194954 | 1.020778 | 1.262729 | -0.955892 | -0.319111 | -0.544818 | 0.746096 | 0.838748 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10014483 | 195 | 1 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 0.288450 | 0.760564 | 0.511767 | 0.194954 | -0.796952 | -0.111235 | -0.955892 | 1.975134 | -0.544818 | 0.746096 | -1.638692 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
| 10446094 | 157 | 0 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 0.000000 | 0.000000 | 0.000000 | 0.194954 | 0.000000 | -0.857147 | -0.955892 | -0.319111 | -0.544818 | -1.340308 | -0.891480 | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
600001 rows × 18 columns
import paddle from paddle.io import DataLoader, Dataset# 定義模型數據集 class CoggleDataset(Dataset):def __init__(self, df):super(CoggleDataset, self).__init__()self.df = dfself.feat_col = list(set(self.df.columns) - set(['user_id', 'end_date', 'label', 'launch_seq', 'playtime_seq', 'duration_prefer', 'interact_prefer']))self.df_feat = self.df[self.feat_col]# 定義需要參與訓練的字段def __getitem__(self, index):launch_seq = self.df['launch_seq'].iloc[index]playtime_seq = self.df['playtime_seq'].iloc[index]duration_prefer = self.df['duration_prefer'].iloc[index]interact_prefer = self.df['interact_prefer'].iloc[index]feat = self.df_feat.iloc[index].values.astype(np.float32)launch_seq = paddle.to_tensor(launch_seq).astype(paddle.float32)playtime_seq = paddle.to_tensor(playtime_seq).astype(paddle.float32)duration_prefer = paddle.to_tensor(duration_prefer).astype(paddle.float32)interact_prefer = paddle.to_tensor(interact_prefer).astype(paddle.float32)feat = paddle.to_tensor(feat).astype(paddle.float32)label = paddle.to_tensor(self.df['label'].iloc[index]).astype(paddle.float32)return launch_seq, playtime_seq, duration_prefer, interact_prefer, feat, labeldef __len__(self):return len(self.df) import paddle# 定義模型,這里是LSTM + FC class CoggleModel(paddle.nn.Layer):def __init__(self):super(CoggleModel, self).__init__()# 序列建模self.launch_seq_gru = paddle.nn.GRU(1, 32)self.playtime_seq_gru = paddle.nn.GRU(1, 32)# 全連接層self.fc1 = paddle.nn.Linear(102, 64)self.fc2 = paddle.nn.Linear(64, 1)def forward(self, launch_seq, playtime_seq, duration_prefer, interact_prefer, feat):launch_seq = launch_seq.reshape((-1, 32, 1))playtime_seq = playtime_seq.reshape((-1, 32, 1))launch_seq_feat = self.launch_seq_gru(launch_seq)[0][:, :, 0]playtime_seq_feat = self.playtime_seq_gru(playtime_seq)[0][:, :, 0]all_feat = paddle.concat([launch_seq_feat, playtime_seq_feat, duration_prefer, interact_prefer, feat], 1)all_feat_fc1 = self.fc1(all_feat)all_feat_fc2 = self.fc2(all_feat_fc1)return all_feat_fc2模型訓練
from tqdm import tqdm import warnings warnings.filterwarnings("ignore")# 模型訓練函數 def train(model, train_loader, optimizer, criterion):model.train()train_loss = []for launch_seq, playtime_seq, duration_prefer, interact_prefer, feat, label in tqdm(train_loader):pred = model(launch_seq, playtime_seq, duration_prefer, interact_prefer, feat)loss = criterion(pred, label)loss.backward()optimizer.step()optimizer.clear_grad()train_loss.append(loss.item())return np.mean(train_loss)# 模型驗證函數 def validate(model, val_loader, optimizer, criterion):model.eval()val_loss = []for launch_seq, playtime_seq, duration_prefer, interact_prefer, feat, label in tqdm(val_loader):pred = model(launch_seq, playtime_seq, duration_prefer, interact_prefer, feat)loss = criterion(pred, label)loss.backward()optimizer.step()optimizer.clear_grad()val_loss.append(loss.item())return np.mean(val_loss)# 模型預測函數 def predict(model, test_loader):model.eval()test_pred = []for launch_seq, playtime_seq, duration_prefer, interact_prefer, feat, label in tqdm(test_loader):pred = model(launch_seq, playtime_seq, duration_prefer, interact_prefer, feat)test_pred.append(pred.numpy())return test_pred from sklearn.model_selection import StratifiedKFold# 模型多折訓練 skf = StratifiedKFold(n_splits=7) fold = 0 for tr_idx, val_idx in skf.split(data, data['label']):train_dataset = CoggleDataset(data.iloc[tr_idx])val_dataset = CoggleDataset(data.iloc[val_idx])# 定義模型、損失函數和優化器model = CoggleModel()optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=0.001)criterion = paddle.nn.MSELoss()train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)# 每個epoch訓練for epoch in range(3):train_loss = train(model, train_loader, optimizer, criterion)val_loss = validate(model, val_loader, optimizer, criterion)print(fold, epoch, train_loss, val_loss)paddle.save(model.state_dict(), f"model_{fold}.pdparams")fold += 1 W1128 20:18:14.128268 128 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W1128 20:18:14.132313 128 device_context.cc:465] device: 0, cuDNN Version: 7.6.1%| | 131/16072 [00:05<09:05, 29.24it/s]模型預測
test_data = pd.read_csv(data_dir + "test_data.txt", sep="\t") test_data["launch_seq"] = test_data.launch_seq.apply(lambda x: json.loads(x)) test_data["playtime_seq"] = test_data.playtime_seq.apply(lambda x: json.loads(x)) test_data["duration_prefer"] = test_data.duration_prefer.apply(lambda x: json.loads(x)) test_data["interact_prefer"] = test_data.interact_prefer.apply(lambda x: json.loads(x)) test_data['label'] = 0test_dataset = CoggleDataset(test_data) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4) test_pred_fold = np.zeros(test_data.shape[0])# 模型多折預測 for idx in range(7):model = CoggleModel()layer_state_dict = paddle.load(f"model_{idx}.pdparams")model.set_state_dict(layer_state_dict)model.eval()test_pred = predict(model, test_loader)test_pred = np.vstack(test_pred)test_pred_fold += test_pred[:, 0]test_pred_fold /= 7 100%|██████████| 235/235 [00:02<00:00, 98.58it/s] 100%|██████████| 235/235 [00:02<00:00, 79.41it/s] 100%|██████████| 235/235 [00:02<00:00, 78.44it/s] 100%|██████████| 235/235 [00:02<00:00, 78.63it/s] 100%|██████████| 235/235 [00:03<00:00, 77.96it/s] 100%|██████████| 235/235 [00:02<00:00, 78.47it/s] 100%|██████████| 235/235 [00:03<00:00, 77.44it/s] test_data["prediction"] = test_pred[:, 0] test_data = test_data[["user_id", "prediction"]] # can clip outputs to [0, 7] or use other tricks test_data.to_csv("./baseline_submission.csv", index=False, header=False, float_format="%.2f")總結
改進思路
總結
以上是生活随笔為你收集整理的WSDM-爱奇艺:用户留存预测挑战赛 线上0.865的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Android11不如,1200万像素的
- 下一篇: 这颗“洋葱”要上市了,低调盈利2亿元能跟