03_pandas布尔索引、isin()筛选、设置值at和iat,loc,reindex、dropna、fillna,isna、求平均值mean、Apply函数、value_counts
布爾索引
案例1
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) print(df) print("------------------df['A']--------------------") print(df['A']) print("-------------使用單列值選擇數(shù)據(jù)---------------") print(df[df['A'] > 0])輸出結(jié)果為:
A B C D 2013-01-01 -0.108862 -0.669454 0.755285 -0.257934 2013-01-02 -1.704769 1.207508 -0.379307 0.360502 2013-01-03 0.790387 0.644891 0.673837 -1.351891 2013-01-04 0.058280 0.759115 0.224668 -0.977974 2013-01-05 -1.616106 2.253661 0.426269 -0.266609 2013-01-06 -1.673112 0.425053 -0.395498 -1.099882 ------------------df['A']-------------------- 2013-01-01 -0.108862 2013-01-02 -1.704769 2013-01-03 0.790387 2013-01-04 0.058280 2013-01-05 -1.616106 2013-01-06 -1.673112 Freq: D, Name: A, dtype: float64 -------------使用單列值選擇數(shù)據(jù)---------------A B C D 2013-01-03 0.790387 0.644891 0.673837 -1.351891 2013-01-04 0.058280 0.759115 0.224668 -0.977974案例2
從一個(gè)DataFrame中選擇符合布爾條件的的值
輸出結(jié)果為:
A B C D 2013-01-01 -0.295841 -1.592536 -0.089252 -1.723649 2013-01-02 0.948812 -1.883173 -0.060775 0.448133 2013-01-03 0.291922 -0.182995 -0.831074 1.303921 2013-01-04 0.342257 0.667672 0.574162 -1.953030 2013-01-05 -0.742187 0.098960 -0.646819 0.960099 2013-01-06 -0.204221 -0.223569 -0.220845 -1.146068 -----------Using a single column’s values to select data:-----------A B C D 2013-01-01 NaN NaN NaN NaN 2013-01-02 0.948812 NaN NaN 0.448133 2013-01-03 0.291922 NaN NaN 1.303921 2013-01-04 0.342257 0.667672 0.574162 NaN 2013-01-05 NaN 0.098960 NaN 0.960099 2013-01-06 NaN NaN NaN NaN使用isin()方法過濾
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) print(df) df2 = df.copy() print("---------df.copy()的結(jié)果為:--------------") print(df2)df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three'] print("---------增加一列之后的值:----------------") print(df2)print("---------使用isin()方法之后的值:----------") print(df2[df2['E'].isin(['two','four'])])輸出額結(jié)果為:
A B C D 2013-01-01 1.248590 -0.197050 1.147376 0.812086 2013-01-02 -0.283422 0.400051 1.307791 0.493767 2013-01-03 0.134394 0.250902 -0.004353 -1.563162 2013-01-04 -0.259653 -1.065737 -0.468399 1.258488 2013-01-05 1.124381 -1.116601 -0.302692 0.631385 2013-01-06 0.361942 0.227282 1.656400 -0.526585 ---------df.copy()的結(jié)果為:--------------A B C D 2013-01-01 1.248590 -0.197050 1.147376 0.812086 2013-01-02 -0.283422 0.400051 1.307791 0.493767 2013-01-03 0.134394 0.250902 -0.004353 -1.563162 2013-01-04 -0.259653 -1.065737 -0.468399 1.258488 2013-01-05 1.124381 -1.116601 -0.302692 0.631385 2013-01-06 0.361942 0.227282 1.656400 -0.526585 ---------增加一列之后的值:----------------A B C D E 2013-01-01 1.248590 -0.197050 1.147376 0.812086 one 2013-01-02 -0.283422 0.400051 1.307791 0.493767 one 2013-01-03 0.134394 0.250902 -0.004353 -1.563162 two 2013-01-04 -0.259653 -1.065737 -0.468399 1.258488 three 2013-01-05 1.124381 -1.116601 -0.302692 0.631385 four 2013-01-06 0.361942 0.227282 1.656400 -0.526585 three ---------使用isin()方法之后的值:----------A B C D E 2013-01-03 0.134394 0.250902 -0.004353 -1.563162 two 2013-01-05 1.124381 -1.116601 -0.302692 0.631385 four設(shè)置
通過時(shí)間的indexes來自動(dòng)創(chuàng)建新列
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) print(df)print("--------------通過pd.Series來進(jìn)行創(chuàng)建:--------------") s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20130101',periods=6)) print(s1)print("--------------設(shè)置一列新值:-------------------------") df['F'] = s1 print(df)print("--------------- df.at[dates[0], 'A'] = 0后----------------------") df.at[dates[0],'A'] = 0 print(df)print("-----------------df.iat[0,1] = 0-------------------") df.iat[0,1] = 0 print(df)輸出結(jié)果為:
A B C D 2013-01-01 -0.418346 -0.184602 -0.570495 -0.590981 2013-01-02 -0.582694 1.605383 -1.430993 -1.058385 2013-01-03 -0.324931 -0.720275 1.075161 0.417305 2013-01-04 0.152029 0.108510 0.011692 -1.424354 2013-01-05 -2.814298 0.041146 0.177329 1.358470 2013-01-06 -0.652250 0.068495 1.016900 0.864277 --------------通過pd.Series來進(jìn)行創(chuàng)建:-------------- 2013-01-01 1 2013-01-02 2 2013-01-03 3 2013-01-04 4 2013-01-05 5 2013-01-06 6 Freq: D, dtype: int64 --------------設(shè)置一列新值:-------------------------A B C D F 2013-01-01 -0.418346 -0.184602 -0.570495 -0.590981 1 2013-01-02 -0.582694 1.605383 -1.430993 -1.058385 2 2013-01-03 -0.324931 -0.720275 1.075161 0.417305 3 2013-01-04 0.152029 0.108510 0.011692 -1.424354 4 2013-01-05 -2.814298 0.041146 0.177329 1.358470 5 2013-01-06 -0.652250 0.068495 1.016900 0.864277 6 --------------- df.at[dates[0], 'A'] = 0后----------------------A B C D F 2013-01-01 0.000000 -0.184602 -0.570495 -0.590981 1 2013-01-02 -0.582694 1.605383 -1.430993 -1.058385 2 2013-01-03 -0.324931 -0.720275 1.075161 0.417305 3 2013-01-04 0.152029 0.108510 0.011692 -1.424354 4 2013-01-05 -2.814298 0.041146 0.177329 1.358470 5 2013-01-06 -0.652250 0.068495 1.016900 0.864277 6 -----------------df.iat[0,1] = 0-------------------A B C D F 2013-01-01 0.000000 0.000000 -0.570495 -0.590981 1 2013-01-02 -0.582694 1.605383 -1.430993 -1.058385 2 2013-01-03 -0.324931 -0.720275 1.075161 0.417305 3 2013-01-04 0.152029 0.108510 0.011692 -1.424354 4 2013-01-05 -2.814298 0.041146 0.177329 1.358470 5 2013-01-06 -0.652250 0.068495 1.016900 0.864277 6案例
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) # print(df)s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20130101',periods=6)) df['F'] = s1 df.at[dates[0],'A'] = 0 df.iat[0,1] = 0print("--------插入一列值-------------------------------------------------------") df.loc[:,'D'] = np.array([5] * len(df)) print(df)print("---------帶有where條件的設(shè)置值(將會(huì)將所有的正值變成它的相反數(shù))--------------") df2 = df.copy() df2[df2 > 0] = -df2 print(df2)輸出結(jié)果為:
---------設(shè)置前的值:---------- --------插入一列值-------------------------------------------------A B C D F 2013-01-01 0.000000 0.000000 -0.539118 5 1 2013-01-02 -0.093590 -0.631473 0.986206 5 2 2013-01-03 -1.409640 -0.043280 -1.178707 5 3 2013-01-04 1.308376 1.979588 0.927735 5 4 2013-01-05 0.368429 1.722930 -0.075515 5 5 2013-01-06 1.176985 0.867951 0.198001 5 6 ---------帶有where條件的設(shè)置值(將會(huì)將所有的正值變成它的相反數(shù))--------------A B C D F 2013-01-01 0.000000 0.000000 -0.539118 -5 -1 2013-01-02 -0.093590 -0.631473 -0.986206 -5 -2 2013-01-03 -1.409640 -0.043280 -1.178707 -5 -3 2013-01-04 -1.308376 -1.979588 -0.927735 -5 -4 2013-01-05 -0.368429 -1.722930 -0.075515 -5 -5 2013-01-06 -1.176985 -0.867951 -0.198001 -5 -6缺失值處理(Missing data)
pandas首先使用np.nan代表缺失值,在默認(rèn)情況下不包括在計(jì)算中。
Reindexing允許你去"改變"/“添加”/"刪除"指定軸上的index.并返回一份數(shù)據(jù)的備份。
reindex
輸出結(jié)果為:
A B C D 2013-01-01 -2.291562 -0.519261 -0.454256 0.645428 2013-01-02 0.279289 -0.282660 1.309209 -0.040633 2013-01-03 1.966930 -2.506430 1.069297 -0.267049 2013-01-04 0.411605 -1.179189 0.241829 -0.955059 2013-01-05 -1.079194 0.294480 0.225865 0.144425 2013-01-06 1.269804 -0.666968 -0.181081 -0.162801 Index(['A', 'B', 'C', 'D'], dtype='object') ---------行數(shù)據(jù)使用[0:4]之間的數(shù)據(jù),列用df.columns,然后加個(gè)E的列:-------------------A B C D E 2013-01-01 -2.291562 -0.519261 -0.454256 0.645428 NaN 2013-01-02 0.279289 -0.282660 1.309209 -0.040633 NaN 2013-01-03 1.966930 -2.506430 1.069297 -0.267049 NaN 2013-01-04 0.411605 -1.179189 0.241829 -0.955059 NaN ---------修改0、1列,將1賦值給0和1列,其它值為原來值:-------------------A B C D E 2013-01-01 -2.291562 -0.519261 -0.454256 0.645428 1.0 2013-01-02 0.279289 -0.282660 1.309209 -0.040633 1.0 2013-01-03 1.966930 -2.506430 1.069297 -0.267049 NaN 2013-01-04 0.411605 -1.179189 0.241829 -0.955059 NaNdropna
去除所有含有NaN的行
輸出結(jié)果為:
A B C D 2013-01-01 0.866323 -1.852962 -0.283425 -0.914182 2013-01-02 1.104424 -1.174742 -0.185782 -0.771816 2013-01-03 -0.391907 -0.612250 0.512120 -0.470910 2013-01-04 0.435786 -0.556628 -0.962486 1.964813 2013-01-05 -0.455669 -0.428983 0.671128 -1.078615 2013-01-06 0.313603 0.495796 0.497048 0.176380 Index(['A', 'B', 'C', 'D'], dtype='object') ---------行數(shù)據(jù)使用[0:4]之間的數(shù)據(jù),列用df.columns,然后加個(gè)E的列:-------------------A B C D E 2013-01-01 0.866323 -1.852962 -0.283425 -0.914182 NaN 2013-01-02 1.104424 -1.174742 -0.185782 -0.771816 NaN 2013-01-03 -0.391907 -0.612250 0.512120 -0.470910 NaN 2013-01-04 0.435786 -0.556628 -0.962486 1.964813 NaN ---------修改0、1列,將1賦值給0和1列,其它值為原來值:-------------------A B C D E 2013-01-01 0.866323 -1.852962 -0.283425 -0.914182 1.0 2013-01-02 1.104424 -1.174742 -0.185782 -0.771816 1.0 2013-01-03 -0.391907 -0.612250 0.512120 -0.470910 NaN 2013-01-04 0.435786 -0.556628 -0.962486 1.964813 NaN ---------刪除所有的含有NaN的行:-------------------A B C D E 2013-01-01 0.866323 -1.852962 -0.283425 -0.914182 1.0 2013-01-02 1.104424 -1.174742 -0.185782 -0.771816 1.0fillna
使用指定值進(jìn)行填充NaN的值。
輸出結(jié)果為:
A B C D 2013-01-01 0.205656 -1.020854 -0.969127 -1.625959 2013-01-02 0.075029 1.416725 0.275302 3.273706 2013-01-03 -0.543870 0.757292 0.063623 0.589556 2013-01-04 -0.403323 -0.486899 1.031774 -0.095886 2013-01-05 2.442459 -0.044233 -0.613714 -0.389008 2013-01-06 -0.068917 -1.598378 -0.262326 -0.455274 Index(['A', 'B', 'C', 'D'], dtype='object') ---------行數(shù)據(jù)使用[0:4]之間的數(shù)據(jù),列用df.columns,然后加個(gè)E的列:-------------------A B C D E 2013-01-01 0.205656 -1.020854 -0.969127 -1.625959 NaN 2013-01-02 0.075029 1.416725 0.275302 3.273706 NaN 2013-01-03 -0.543870 0.757292 0.063623 0.589556 NaN 2013-01-04 -0.403323 -0.486899 1.031774 -0.095886 NaN ---------修改0、1列,將1賦值給0和1列,其它值為原來值:--------------------------------A B C D E 2013-01-01 0.205656 -1.020854 -0.969127 -1.625959 1.0 2013-01-02 0.075029 1.416725 0.275302 3.273706 1.0 2013-01-03 -0.543870 0.757292 0.063623 0.589556 NaN 2013-01-04 -0.403323 -0.486899 1.031774 -0.095886 NaN ---------使用指定值填充NaN的值:----------------------------------------------------A B C D E 2013-01-01 0.205656 -1.020854 -0.969127 -1.625959 1.0 2013-01-02 0.075029 1.416725 0.275302 3.273706 1.0 2013-01-03 -0.543870 0.757292 0.063623 0.589556 5.0 2013-01-04 -0.403323 -0.486899 1.031774 -0.095886 5.0isna
得到是NaN的值的布爾值。如下:
輸出結(jié)果為:
---------行數(shù)據(jù)使用[0:4]之間的數(shù)據(jù),列用df.columns,然后加個(gè)E的列:------------------- ---------修改0、1列,將1賦值給0和1列,其它值為原來值:--------------------------------A B C D E 2013-01-01 -0.242267 1.005525 -2.121381 -0.980096 1.0 2013-01-02 1.173845 3.088588 -0.102826 1.653409 1.0 2013-01-03 -0.451330 -1.287177 0.496383 -1.114614 NaN 2013-01-04 -1.378555 -1.119537 1.091021 -0.626668 NaN ---------得到所有是布爾值的值:------------------------------A B C D E 2013-01-01 False False False False False 2013-01-02 False False False False False 2013-01-03 False False False False True 2013-01-04 False False False False True數(shù)學(xué)操作
求平均值mean
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))print(df) print("------------df.mean()按列求平均值:--------------") print(df.mean()) print("------------df.mean(1)按行求平均值:-------------") print(df.mean(1))輸出結(jié)果:
A B C D 2013-01-01 1.000250 0.584595 1.028464 -0.103284 2013-01-02 0.539096 1.928019 1.975352 0.421709 2013-01-03 0.009377 1.112230 -0.855621 0.942441 2013-01-04 1.610289 0.030575 0.707928 0.281609 2013-01-05 2.675944 0.286336 1.750777 -0.463347 2013-01-06 0.216954 -0.345792 0.296372 1.740097 ------------df.mean求平均值:-------------- A 1.008652 B 0.599327 C 0.817212 D 0.469871 dtype: float64 2013-01-01 0.627506 2013-01-02 1.216044 2013-01-03 0.302107 2013-01-04 0.657600 2013-01-05 1.062427 2013-01-06 0.476908 Freq: D, dtype: float64Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the speci?ed dimension
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))print(df) s = pd.Series([1,3,5,np.nan,6,8],index=dates).shift(2) print("----------------------------------------------------") print(s) print("--------------df.sub(s,axis='index')----------------") print(df.sub(s, axis='index'))Apply
將行數(shù)應(yīng)用于數(shù)據(jù):
import numpy as np import pandas as pd# 通過設(shè)置開始時(shí)間,并設(shè)置間隔了多少月 dates = pd.date_range('20130101',periods=6)# 隨機(jī)生成一個(gè)6行4列的值 # print(np.random.randn(6,4))# 設(shè)置dates為行,ABCD為列的標(biāo)題值,np.random.randn(6, 4)為行和列中的值 df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))print(df) print("----------列方向上,下一個(gè)元素的值是上面所有值的和-------------") print(df.apply(np.cumsum)) print("----------行方向上,最大值 - 最小值---------------------------") print(df.apply(lambda x: x.max() - x.min()))輸出結(jié)果:
A B C D 2013-01-01 0.638475 -0.213429 0.738899 -0.439204 2013-01-02 -0.100059 -0.911364 -0.761939 0.815867 2013-01-03 1.241903 -0.086499 -2.127611 0.695337 2013-01-04 -0.334616 -0.333722 1.458706 -0.865675 2013-01-05 1.135460 -0.913456 -0.253340 -0.784230 2013-01-06 -0.447679 0.162478 1.423114 0.908347 ----------列方向上,下一個(gè)元素的值是上面所有值的和-------------A B C D 2013-01-01 0.638475 -0.213429 0.738899 -0.439204 2013-01-02 0.538416 -1.124794 -0.023041 0.376664 2013-01-03 1.780319 -1.211293 -2.150652 1.072001 2013-01-04 1.445703 -1.545015 -0.691945 0.206326 2013-01-05 2.581163 -2.458471 -0.945286 -0.577904 2013-01-06 2.133485 -2.295993 0.477828 0.330443 ----------行方向上,最大值 - 最小值--------------------------- A 1.689582 B 1.075934 C 3.586318 D 1.774022 dtype: float64Histogramming ()
統(tǒng)計(jì)值出現(xiàn)的次數(shù)
import numpy as np import pandas as pds = pd.Series(np.random.randint(0,7,size=10)) print(s)print("--------------統(tǒng)計(jì)值出現(xiàn)的次數(shù)---------------") print(s.value_counts())輸出結(jié)果:
0 5 1 5 2 6 3 4 4 3 5 0 6 4 7 6 8 6 9 5 dtype: int32 --------------統(tǒng)計(jì)值出現(xiàn)的次數(shù)--------------- 6 3 5 3 4 2 3 1 0 1 dtype: int64總結(jié)
以上是生活随笔為你收集整理的03_pandas布尔索引、isin()筛选、设置值at和iat,loc,reindex、dropna、fillna,isna、求平均值mean、Apply函数、value_counts的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 为什么要吃年夜饭 年夜饭的文化意义和历史
- 下一篇: 很饿没胃口吃什么好?