當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

20个Pandas数据实战案例，干货多多

發(fā)布時(shí)間：2024/9/15 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 20个Pandas数据实战案例，干货多多小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

今天我們講一下pandas當(dāng)中的數(shù)據(jù)過濾內(nèi)容，小編之前也寫過也一篇相類似的文章，但是是基于文本數(shù)據(jù)的過濾，大家有興趣也可以去查閱一下。

下面小編會(huì)給出大概20個(gè)案例來詳細(xì)說明數(shù)據(jù)過濾的方法，首先我們先建立要用到的數(shù)據(jù)集，代碼如下

import?pandas?as?pd df?=?pd.DataFrame({"name":?["John","Jane","Emily","Lisa","Matt"],"note":?[92,94,87,82,90],"profession":["Electrical?engineer","Mechanical?engineer","Data?scientist","Accountant","Athlete"],"date_of_birth":["1998-11-01","2002-08-14","1996-01-12","2002-10-24","2004-04-05"],"group":["A","B","B","A","C"] })

output

name??note???????????profession?date_of_birth?group 0???John????92??Electrical?engineer????1998-11-01?????A 1???Jane????94??Mechanical?engineer????2002-08-14?????B 2??Emily????87???????Data?scientist????1996-01-12?????B 3???Lisa????82???????????Accountant????2002-10-24?????A 4???Matt????90??????????????Athlete????2004-04-05?????C

篩選表格中的若干列

代碼如下

df[["name","note"]]

output

name??note 0???John????92 1???Jane????94 2??Emily????87 3???Lisa????82 4???Matt????90

再篩選出若干行

我們基于上面搜索出的結(jié)果之上，再篩選出若干行，代碼如下

df.loc[:3,?["name","note"]]

output

name??note 0???John????92 1???Jane????94 2??Emily????87 3???Lisa????82

根據(jù)索引來過濾數(shù)據(jù)

這里我們用到的是iloc方法，代碼如下

df.iloc[:3,?2]

output

0????Electrical?engineer 1????Mechanical?engineer 2?????????Data?scientist

通過比較運(yùn)算符來篩選數(shù)據(jù)

df[df.note?>?90]

output

name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A 1??Jane????94??Mechanical?engineer????2002-08-14?????B

dt屬性接口

dt屬性接口是用于處理時(shí)間類型的數(shù)據(jù)的，當(dāng)然首先我們需要將字符串類型的數(shù)據(jù)，或者其他類型的數(shù)據(jù)轉(zhuǎn)換成事件類型的數(shù)據(jù)，然后再處理，代碼如下

df.date_of_birth?=?df.date_of_birth.astype("datetime64[ns]") df[df.date_of_birth.dt.month==11]

output

name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A

或者我們也可以

df[df.date_of_birth.dt.year?>?2000]

output

name??note???????????profession?date_of_birth?group 1??Jane????94??Mechanical?engineer????2002-08-14?????B 3??Lisa????82???????????Accountant????2002-10-24?????A 4??Matt????90??????????????Athlete????2004-04-05?????C

多個(gè)條件交集過濾數(shù)據(jù)

當(dāng)我們遇上多個(gè)條件，并且是交集的情況下過濾數(shù)據(jù)時(shí)，代碼應(yīng)該這么來寫

df[(df.date_of_birth.dt.year?>?2000)?&??(df.profession.str.contains("engineer"))]

output

name??note???????????profession?date_of_birth?group 1??Jane????94??Mechanical?engineer????2002-08-14?????B

多個(gè)條件并集篩選數(shù)據(jù)

當(dāng)多個(gè)條件是以并集的方式來過濾數(shù)據(jù)的時(shí)候，代碼如下

df[(df.note?>?90)?|?(df.profession=="Data?scientist")]

output

Query方法過濾數(shù)據(jù)

Pandas當(dāng)中的query方法也可以對(duì)數(shù)據(jù)進(jìn)行過濾，我們將過濾的條件輸入

df.query("note?>?90")

output

name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A 1??Jane????94??Mechanical?engineer????2002-08-14?????B

又或者是

df.query("group=='A'?and?note?>?89")

output

name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A

nsmallest方法過濾數(shù)據(jù)

pandas當(dāng)中的nsmallest以及nlargest方法是用來找到數(shù)據(jù)集當(dāng)中最大、最小的若干數(shù)據(jù)，代碼如下

df.nsmallest(2,?"note")

output

name??note??????profession?date_of_birth?group 3???Lisa????82??????Accountant????2002-10-24?????A 2??Emily????87??Data?scientist????1996-01-12?????Bdf.nlargest(2,?"note")

output

name??note???????????profession?date_of_birth?group 1??Jane????94??Mechanical?engineer????2002-08-14?????B 0??John????92??Electrical?engineer????1998-11-01?????A

isna()方法

isna()方法功能在于過濾出那些是空值的數(shù)據(jù)，首先我們將表格當(dāng)中的某些數(shù)據(jù)設(shè)置成空值

df.loc[0,?"profession"]?=?np.nan df[df.profession.isna()]

output

name??note?profession?date_of_birth?group 0??John????92????????NaN????1998-11-01?????A

notna()方法

notna()方法上面的isna()方法正好相反的功能在于過濾出那些不是空值的數(shù)據(jù)，代碼如下

df[df.profession.notna()]

output

name??note???????????profession?date_of_birth?group 1???Jane????94??Mechanical?engineer????2002-08-14?????B 2??Emily????87???????Data?scientist????1996-01-12?????B 3???Lisa????82???????????Accountant????2002-10-24?????A 4???Matt????90??????????????Athlete????2004-04-05?????C

assign方法

pandas當(dāng)中的assign方法作用是直接向數(shù)據(jù)集當(dāng)中來添加一列

df_1?=?df.assign(score=np.random.randint(0,100,size=5)) df_1

output

name??note???????????profession?date_of_birth?group??score 0???John????92??Electrical?engineer????1998-11-01?????A?????19 1???Jane????94??Mechanical?engineer????2002-08-14?????B?????84 2??Emily????87???????Data?scientist????1996-01-12?????B?????68 3???Lisa????82???????????Accountant????2002-10-24?????A?????70 4???Matt????90??????????????Athlete????2004-04-05?????C?????39

explode方法

explode()方法直譯的話，是爆炸的意思，我們經(jīng)常會(huì)遇到這樣的數(shù)據(jù)集

Name????????????Hobby 0???呂布??[打籃球,?玩游戲,?喝奶茶] 1???貂蟬???????[敲代碼,?看電影] 2???趙云????????[聽音樂,?健身]

Hobby列當(dāng)中的每行數(shù)據(jù)都以列表的形式集中到了一起，而explode()方法則是將這些集中到一起的數(shù)據(jù)拆開來，代碼如下

Name?Hobby 0???呂布???打籃球 0???呂布???玩游戲 0???呂布???喝奶茶 1???貂蟬???敲代碼 1???貂蟬???看電影 2???趙云???聽音樂 2???趙云????健身

當(dāng)然我們會(huì)展開來之后，數(shù)據(jù)會(huì)存在重復(fù)的情況，

df.explode('Hobby').drop_duplicates().reset_index(drop=True)

output

Name?Hobby 0???呂布???打籃球 1???呂布???玩游戲 2???呂布???喝奶茶 3???貂蟬???敲代碼 4???貂蟬???看電影 5???趙云???聽音樂 6???趙云????健身

END

各位伙伴們好，詹帥本帥搭建了一個(gè)個(gè)人博客和小程序，匯集各種干貨和資源，也方便大家閱讀，感興趣的小伙伴請(qǐng)移步小程序體驗(yàn)一下哦！（歡迎提建議）

推薦閱讀

牛逼！Python常用數(shù)據(jù)類型的基本操作（長文系列第①篇）

牛逼！Python的判斷、循環(huán)和各種表達(dá)式（長文系列第②篇）

牛逼！Python函數(shù)和文件操作（長文系列第③篇）

牛逼！Python錯(cuò)誤、異常和模塊（長文系列第④篇）

總結(jié)

以上是生活随笔為你收集整理的20个Pandas数据实战案例，干货多多的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：互联网公司职级和薪资一览！
下一篇：肝了这套Python数据分析教程，进字节

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

20个Pandas数据实战案例，干货多多

篩選表格中的若干列

再篩選出若干行

根據(jù)索引來過濾數(shù)據(jù)

通過比較運(yùn)算符來篩選數(shù)據(jù)

dt屬性接口

多個(gè)條件交集過濾數(shù)據(jù)

多個(gè)條件并集篩選數(shù)據(jù)

Query方法過濾數(shù)據(jù)

nsmallest方法過濾數(shù)據(jù)

isna()方法

notna()方法

assign方法

explode方法

總結(jié)