04_pandas字符串函数;数据合并concat、merge;分组groupby;Reshaping;Pivot tables;时间处理(date_range、tz_localize等)
                                                            生活随笔
收集整理的這篇文章主要介紹了
                                04_pandas字符串函数;数据合并concat、merge;分组groupby;Reshaping;Pivot tables;时间处理(date_range、tz_localize等)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
                                字符串函數,Series的lower()函數
Series在str屬性中提供了一組字符串處理方法,可以方便地對數組中的每個元素進行操作,如下面的代碼片段所示。請注意,str中的模式匹配通常默認使用正則表達式(在某些情況下總是使用正則表達式)
import numpy as np import pandas as pds = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) print(s.str.lower())輸出結果為:
0 a 1 b 2 c 3 aaba 4 baca 5 NaN 6 caba 7 dog 8 cat dtype: object數據合并
concat
import numpy as np import pandas as pddf = pd.DataFrame(np.random.randn(10,4)) print(df)print("------------------------------------") print(df[:3]) #pieces = [df[]] print("------------------------------------") print(df[3:7]) print("------------------------------------") print(df[7:])pieces = [df[:3],df[3:7],df[7:]] print("------------------------------------") print(pieces) print(pd.concat(pieces))輸出結果:
0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094 ------------------------------------0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919 ------------------------------------0 1 2 3 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324 ------------------------------------0 1 2 3 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094 ------------------------------------ [ 0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919, 0 1 2 3 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324, 0 1 2 3 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094]0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094merge
import numpy as np import pandas as pdleft = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]}) print(left)right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]}) print(right) print(pd.merge(left,right,on='key'))輸出結果為:
key lval 0 foo 1 1 foo 2key rval 0 foo 4 1 foo 5key lval rval 0 foo 1 4 1 foo 1 5 2 foo 2 4 3 foo 2 5分組
groupby
按照指定列進行分組,有點類似sql語句里面的分組的概念的樣子
import numpy as np import pandas as pddf = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],'B':['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],'C':np.random.randn(8),'D':np.random.randn(8)}) print(df) print("-----------------------------------") print(df.groupby('A').sum()) print("-----------------------------------") print(df.groupby(['A', 'B']).sum())輸出結果為:
A B C D 0 foo one -0.411487 -0.908131 1 bar one 0.803172 -0.093416 2 foo two 0.079114 -0.594352 3 bar three -1.423867 -0.025747 4 foo two 0.832108 0.818305 5 bar two 0.551068 -0.859953 6 foo one -1.052481 -0.220297 7 foo three -2.639817 0.402972 -----------------------------------C D A bar -0.069626 -0.979116 foo -3.192563 -0.501502 -----------------------------------C D A B bar one 0.803172 -0.093416three -1.423867 -0.025747two 0.551068 -0.859953 foo one -1.463968 -1.128427three -2.639817 0.402972two 0.911223 0.223953Reshaping
import numpy as np import pandas as pdtuples = list(zip(*[['bar', 'bar', 'baz', 'baz','foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two','one', 'two', 'one', 'two']]))index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) print(index) df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])df2 = df[:4] print(df2) print("---------df2.stack()------------------") stacked = df2.stack() print(stacked)print("---------stacked.unstack()------------") print(stacked.unstack())print("---------stacked.unstack(1)-----------") print(stacked.unstack(1))print("---------stacked.unstack(0)-----------") print(stacked.unstack(0))輸出結果為 :
MultiIndex([('bar', 'one'),('bar', 'two'),('baz', 'one'),('baz', 'two'),('foo', 'one'),('foo', 'two'),('qux', 'one'),('qux', 'two')],names=['first', 'second'])A B first second bar one 0.189887 0.637367two -0.341858 -0.895612 baz one 0.517839 -0.798281two -0.712129 -1.355618 ---------df2.stack()------------------ first second bar one A 0.189887B 0.637367two A -0.341858B -0.895612 baz one A 0.517839B -0.798281two A -0.712129B -1.355618 dtype: float64 ---------stacked.unstack()------------A B first second bar one 0.189887 0.637367two -0.341858 -0.895612 baz one 0.517839 -0.798281two -0.712129 -1.355618 ---------stacked.unstack(1)----------- second one two first bar A 0.189887 -0.341858B 0.637367 -0.895612 baz A 0.517839 -0.712129B -0.798281 -1.355618 ---------stacked.unstack(0)----------- first bar baz second one A 0.189887 0.517839B 0.637367 -0.798281 two A -0.341858 -0.712129B -0.895612 -1.355618Pivot tables
import numpy as np import pandas as pddf = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,'B':['A','B','C'] * 4,'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,'D':np.random.randn(12),'E':np.random.randn(12)})print(df) print("-------------------------------") print(pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C']))輸出結果為:
A B C D E 0 one A foo -0.282109 1.696844 1 one B foo 0.715732 0.283795 2 two C foo 0.889333 0.621878 3 three A bar -1.065137 1.184847 4 one B bar 0.420288 -0.299934 5 one C bar -1.269725 -1.261542 6 two A foo 1.142230 1.887502 7 three B foo -0.456574 0.650669 8 one C foo -0.146470 -0.307011 9 one A bar 0.944573 0.967164 10 two B bar 0.432492 -0.554618 11 three C bar -1.928619 -1.158268 ------------------------------- C bar foo A B one A 0.944573 -0.282109B 0.420288 0.715732C -1.269725 -0.146470 three A -1.065137 NaNB NaN -0.456574C -1.928619 NaN two A NaN 1.142230B 0.432492 NaNC NaN 0.889333時間處理
panda具有在頻率轉換期間執行重采樣操作的簡單、強大和高效的功能(例如,,將第二個數據轉換為5分鐘數據)。這在金融應用程序中非常常見,但不限于此。
import numpy as np import pandas as pdrng = pd.date_range('1/1/2012',periods=100,freq='S') ts = pd.Series(np.random.randint(0,500,len(rng)),index=rng) print(ts.resample('5Min').sum())print("---------------------date_range-------------------------")rng = pd.date_range('3/6/2012 00:00',periods=5,freq='D') ts = pd.Series(np.random.randn(len(rng)),rng) print(ts)print("----------------------tz_localize-----------------------") ts_utc = ts.tz_localize('UTC') print(ts_utc)print("------------轉換成其它時區的值---------------------------") print(ts_utc.tz_convert('US/Eastern'))print("------------在時間跨度表示之間進行轉換-------------------") rng = pd.date_range('1/1/2012',periods=5,freq='M') ts = pd.Series(np.random.randn(len(rng)),index=rng) print(ts)print("----------------to_period------------------------------") ps = ts.to_period() print(ts)print("----------------to_timestamp---------------------------") print(ps.to_timestamp())print("------------------------------------------------------") prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV') ts = pd.Series(np.random.randn(len(prng)), prng) ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9 print(ts.head())輸出結果為:
2012-01-01 24102 Freq: 5T, dtype: int32 ---------------------date_range------------------------- 2012-03-06 0.059085 2012-03-07 0.216838 2012-03-08 -1.465363 2012-03-09 -0.349098 2012-03-10 -0.818129 Freq: D, dtype: float64 ----------------------tz_localize----------------------- 2012-03-06 00:00:00+00:00 0.059085 2012-03-07 00:00:00+00:00 0.216838 2012-03-08 00:00:00+00:00 -1.465363 2012-03-09 00:00:00+00:00 -0.349098 2012-03-10 00:00:00+00:00 -0.818129 Freq: D, dtype: float64 ------------轉換成其它時區的值--------------------------- 2012-03-05 19:00:00-05:00 0.059085 2012-03-06 19:00:00-05:00 0.216838 2012-03-07 19:00:00-05:00 -1.465363 2012-03-08 19:00:00-05:00 -0.349098 2012-03-09 19:00:00-05:00 -0.818129 Freq: D, dtype: float64 ------------在時間跨度表示之間進行轉換------------------- 2012-01-31 -0.682776 2012-02-29 0.895222 2012-03-31 -0.162116 2012-04-30 -1.175630 2012-05-31 -0.936218 Freq: M, dtype: float64 ----------------to_period------------------------------ 2012-01-31 -0.682776 2012-02-29 0.895222 2012-03-31 -0.162116 2012-04-30 -1.175630 2012-05-31 -0.936218 Freq: M, dtype: float64 ----------------to_timestamp--------------------------- 2012-01-01 -0.682776 2012-02-01 0.895222 2012-03-01 -0.162116 2012-04-01 -1.175630 2012-05-01 -0.936218 Freq: MS, dtype: float64 ------------------------------------------------------ 1990-03-01 09:00 1.847485 1990-06-01 09:00 -0.909369 1990-09-01 09:00 1.381791 1990-12-01 09:00 0.997901 1991-03-01 09:00 1.470387 Freq: H, dtype: float64Categoricals
在pandas的DataFrame中包括categorical 數據.
import numpy as np import pandas as pddf = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],"raw_grade":['a','b','b','a','a','e']})print("-----------將原始等級轉換為分類數據類型-------------") df["grade"] = df["raw_grade"].astype("category") print(df["grade"])# 重新命名這個分類范疇為更有意義的名字 print("-------重新命名這個分類范疇為更有意義的名字----------") df["grade"].cat.categories = ["very good", "good", "very bad"] df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium","good","very good"]) print(df["grade"])print("-----按照分類中的[very bad, bad, medium, good, very good]范疇進行排序----") print(df.sort_values(by="grade"))print("-----按照分類中的[very bad, bad, medium, good, very good]范疇進行排序,并且顯示空數據的值----") print(df.groupby('grade').size())輸出結果:
-----------將原始等級轉換為分類數據類型------------- 0 a 1 b 2 b 3 a 4 a 5 e Name: grade, dtype: category Categories (3, object): [a, b, e] -------重新命名這個分類范疇為更有意義的名字---------- 0 very good 1 good 2 good 3 very good 4 very good 5 very bad Name: grade, dtype: category Categories (5, object): [very bad, bad, medium, good, very good] -----按照分類中的[very bad, bad, medium, good, very good]范疇進行排序----id raw_grade grade 5 6 e very bad 1 2 b good 2 3 b good 0 1 a very good 3 4 a very good 4 5 a very good -----按照分類中的[very bad, bad, medium, good, very good]范疇進行排序,并且顯示空數據的值---- grade very bad 1 bad 0 medium 0 good 2 very good 3 dtype: int64總結
以上是生活随笔為你收集整理的04_pandas字符串函数;数据合并concat、merge;分组groupby;Reshaping;Pivot tables;时间处理(date_range、tz_localize等)的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: 梦见河里好多鱼都死了票在水上我去捞捞了好
- 下一篇: 05_pandas读写文件,读写数据到C
