當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Understanding Clouds from Satellite Images语义分割比赛中train_test_split与stratify配合使用

發布時間：2023/12/20 编程问答 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 Understanding Clouds from Satellite Images语义分割比赛中train_test_split与stratify配合使用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

用法來自:
https://www.kaggle.com/mobassir/keras-efficientnetb2-for-classifying-cloud

數據集中每張圖片可能包含1種云朵到4種云朵不等。

比賽要求返回rle格式的submission.csv

其中數據集分割代碼如下:

train_imgs, val_imgs = train_test_split(train_df['Image'].values, test_size=0.2, stratify=train_df['Class'].map(lambda x: str(sorted(list(x)))), # sorting present classes in lexicographical order, just to be surerandom_state=2019)

所以這個加入stratify的train_test_split到底啥效果呢？

我們來探索下：

train_now = pd.DataFrame({'Image': train_imgs})#ndarray轉化為DataFrame result=pd.merge(left=train_df, right=train_now, how='inner', left_on='Image', right_on='Image')#train_df與train_now做交集運算result["analysis"]=result["Fish"].apply(str)+result["Flower"].apply(str)+result["Sugar"].apply(str)+result["Gravel"].apply(str)#拼接dataframe中的后四列print(result['analysis'].value_counts(normalize = False, dropna = False))#統計各個集合的數量

這里"集合"的意思是，每張衛星圖片包含的云朵，例如[1,0,1,0]

表示包含了兩種云朵。

最有一句result輸出結果是:

1011 581 0011 581 0110 369 1010 369 0010 346 0100 284 0111 279 1110 262 1100 235 0001 230 1001 219 1000 219 1111 213 1101 126 0101 123

比較原來的train_df的集合數據:

1011 ? ?726
0011 ? ?726
1010 ? ?462
0110 ? ?462
0010 ? ?432
0100 ? ?355
0111 ? ?349
1110 ? ?328
1100 ? ?294
0001 ? ?287
1001 ? ?274
1000 ? ?274
1111 ? ?266
1101 ? ?157
0101 ? ?154

可知，split_train_test配合stratify的作用是:

每張圖片中包含1~4種云朵,在分割的時候，根據每張圖片所屬云朵集合的不同，把圖片分為15類，每一類圖片中抽取80%作為train，其余作為validation

總結

以上是生活随笔為你收集整理的Understanding Clouds from Satellite Images语义分割比赛中train_test_split与stratify配合使用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Understanding Clouds
下一篇： Backbone发展与语义分割网络发展

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Understanding Clouds from Satellite Images语义分割比赛中train_test_split与stratify配合使用

總結