Python Pandas –操作
Pandas support very useful operations which are illustrated below,
熊貓支持非常有用的操作,如下所示,
Consider the below dataFrame,
考慮下面的dataFrame,
import numpy as np import pandas as pddf = pd.DataFrame({'col1': [1, 2, 3, 4],'col2': [444, 555, 666, 444],'col3': ['abc', 'def', 'ghi', 'xyz'] })print(df.head())''' Output:col1 col2 col3 0 1 444 abc 1 2 555 def 2 3 666 ghi 3 4 444 xyz '''在數(shù)據(jù)框中查找唯一值 (Finding unique values in a data frame)
In order to find unique values from columns,
為了從列中找到唯一值,
# returns numpy array of all unique values print(df['col2'].unique() ) # Output: array([444, 555, 666])# returns length / number of unique values # in a numpy array print(df['col2'].nunique()) # Output: 3# if we want the table of the unique values # and how many times they show up print(df['col2'].value_counts() ) ''' Output: 444 2 555 1 666 1 Name: col2, dtype: int64 '''從數(shù)據(jù)框中選擇數(shù)據(jù) (Selecting data from a data frame)
Consider the dataFrame,
考慮一下dataFrame,
Using the conditional selection, we could select data as follows,
使用條件選擇,我們可以選擇以下數(shù)據(jù),
print(df['col1']>2)''' Output: 0 False 1 False 2 True 3 True Name: col1, dtype: bool '''print(df[(df['col1']>2)])''' Output:col1 col2 col3 2 3 666 ghi 3 4 444 xyz '''print(df[df['col1']>2 & (df['col2']==44)])''' Output:col1 col2 col3 0 1 444 abc 1 2 555 def 2 3 666 ghi 3 4 444 xyz '''應用方法 (Applied Methods)
Consider a simple method,
考慮一個簡單的方法,
def times2(x):return x*2We already are aware that we can grab a column and call a built-in function off of it. Such as below,
我們已經(jīng)知道我們可以抓住一列并從中調(diào)用一個內(nèi)置函數(shù)。 如下
print(df['col1'].sum()) # Output: 10Now, in order to apply the custom function, such as one defined above (times2), pandas provide an option to do that as well, as explained below,
現(xiàn)在,為了應用自定義功能(例如上面定義的時間(times2)),熊貓也提供了執(zhí)行此功能的選項,如下所述,
print(df['col2'].apply(times2))''' Output: 0 888 1 1110 2 1332 3 888 Name: col2, dtype: int64 '''Apply built-in functions,
應用內(nèi)置功能,
print(df['col3'].apply(len))''' Output: 0 3 1 3 2 3 3 3 Name: col3, dtype: int64 '''Apply method will be more powerful, when combined with lambda expressions. For instance,
與lambda表達式結合使用時,apply方法將更強大。 例如,
print(df['col2'].apply(lambda x: x*2))''' Output: 0 888 1 1110 2 1332 3 888 Name: col2, dtype: int64 '''更多操作 (Some more operations)
# returns the columns names print(df.columns) # Output: Index(['col1', 'col2', 'col3'], dtype='object')#since this is a rangeindex, it actually reports # start, stop and step values too print(df.index) # Output: RangeIndex(start=0, stop=4, step=1)# sort by column print(df.sort_values('col2'))''' Output:col1 col2 col3 0 1 444 abc 3 4 444 xyz 1 2 555 def 2 3 666 ghi '''In the above result, note that the index values doesn't change, this is to ensure that the values is retained.
在上面的結果中,請注意索引值不會更改,這是為了確保保留這些值。
isnull
一片空白
# isnull print(df.isnull())''' Outputcol1 col2 col3 0 False False False 1 False False False 2 False False False 3 False False False '''The isnull() will return a dataframe of booleans indicating whether or not the value was null or not. In the above, we get a boolean of all false because we have nulls in our dataframe.
notull()將返回一個布爾值數(shù)據(jù)框,指示該值是否為null。 在上面的代碼中,由于我們的數(shù)據(jù)幀中包含null,因此我們得到的布爾值均為false。
Drop NAN values
降低NAN值
print(df.dropna())''' Output:col1 col2 col3 0 1 444 abc 1 2 555 def 2 3 666 ghi 3 4 444 xyz '''Fill NAN values with custom values
用自定義值填充NAN值
df = pd.DataFrame({'col1': [1, 2, 3, np.nan],'col2': [np.nan, 555, 666, 444],'col3': ['abc', 'def', 'ghi', 'xyz'] })print(df)''' Output:col1 col2 col3 0 1.0 NaN abc 1 2.0 555.0 def 2 3.0 666.0 ghi 3 NaN 444.0 xyz '''print(df.fillna('FILL'))''' Output:col1 col2 col3 0 1 FILL abc 1 2 555 def 2 3 666 ghi 3 FILL 444 xyz '''Usage of pivot table
數(shù)據(jù)透視表的用法
This methodology will be familiar for the Advanced Excel users. Consider a new dataFrame,
Advanced Excel用戶將熟悉這種方法。 考慮一個新的dataFrame,
data = {'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],'B': ['one', 'one', 'two', 'two', 'one', 'one'],'C': ['x', 'y', 'x', 'y', 'x', 'y'],'D': [1, 3, 2, 5, 4, 1] }df = pd.DataFrame(data)print(df)''' Output:A B C D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1 '''The pivot table, creates a multi index dataFrame. The pivot table takes three main arguments, the values, the index and the columns.
數(shù)據(jù)透視表創(chuàng)建一個多索引dataFrame。 數(shù)據(jù)透視表采用三個主要參數(shù),即值,索引和列。
print(df.pivot_table(values='D',index=['A', 'B'],columns=['C']))''' Output:C x y A B bar one 4.0 1.0two NaN 5.0 foo one 1.0 3.0two 2.0 NaN '''翻譯自: https://www.includehelp.com/python/python-pandas-operations.aspx
總結
以上是生活随笔為你收集整理的Python Pandas –操作的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ai人工智能_人工智能能力问答中的人工智
- 下一篇: python无符号转有符号_Python