當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象

發(fā)布時(shí)間：2023/12/10 python 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

CSDN 課程推薦：《邁向數(shù)據(jù)科學(xué)家：帶你玩轉(zhuǎn)Python數(shù)據(jù)分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學(xué)應(yīng)用統(tǒng)計(jì)專業(yè)碩士生指導(dǎo)委員會(huì)委員；已出版《跟老齊學(xué)Python：輕松入門》《跟老齊學(xué)Python：Django實(shí)戰(zhàn)》、《跟老齊學(xué)Python：數(shù)據(jù)分析》和《Python大學(xué)實(shí)用教程》暢銷圖書。

Pandas 系列文章：

Python 數(shù)據(jù)分析三劍客之 Pandas（一）：認(rèn)識(shí) Pandas 及其 Series、DataFrame 對(duì)象
Python 數(shù)據(jù)分析三劍客之 Pandas（二）：Index 索引對(duì)象以及各種索引操作
Python 數(shù)據(jù)分析三劍客之 Pandas（三）：算術(shù)運(yùn)算與缺失值的處理
Python 數(shù)據(jù)分析三劍客之 Pandas（四）：函數(shù)應(yīng)用、映射、排序和層級(jí)索引
Python 數(shù)據(jù)分析三劍客之 Pandas（五）：統(tǒng)計(jì)計(jì)算與統(tǒng)計(jì)描述
Python 數(shù)據(jù)分析三劍客之 Pandas（六）：GroupBy 數(shù)據(jù)分裂、應(yīng)用與合并
Python 數(shù)據(jù)分析三劍客之 Pandas（七）：合并數(shù)據(jù)集
Python 數(shù)據(jù)分析三劍客之 Pandas（八）：數(shù)據(jù)重塑、重復(fù)數(shù)據(jù)處理與數(shù)據(jù)替換
Python 數(shù)據(jù)分析三劍客之 Pandas（九）：時(shí)間序列
Python 數(shù)據(jù)分析三劍客之 Pandas（十）：數(shù)據(jù)讀寫

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關(guān)注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學(xué)習(xí)資料與網(wǎng)站（博主參與部分文檔翻譯）：

NumPy 官方中文網(wǎng)：https://www.numpy.org.cn/
Pandas 官方中文網(wǎng)：https://www.pypandas.cn/
Matplotlib 官方中文網(wǎng)：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

- 【01x00】了解 Pandas
- 【02x00】Pandas 數(shù)據(jù)結(jié)構(gòu)
- 【03x00】Series 對(duì)象
- - 【03x01】通過 list 構(gòu)建 Series
  - 【03x02】通過 dict 構(gòu)建 Series
  - 【03x03】獲取其數(shù)據(jù)和索引
  - 【03x04】通過索引獲取數(shù)據(jù)
  - 【03x05】使用函數(shù)運(yùn)算
  - 【03x06】name 屬性
- 【04x00】DataFrame 對(duì)象
- - 【03x01】通過 ndarray 構(gòu)建 DataFrame
  - 【03x02】通過 dict 構(gòu)建 DataFrame
  - 【03x03】獲取其數(shù)據(jù)和索引
  - 【03x04】通過索引獲取數(shù)據(jù)
  - 【03x05】修改列的值
  - 【03x06】增加 / 刪除列
  - 【03x07】name 屬性

這里是一段防爬蟲文本，請(qǐng)讀者忽略。本文原創(chuàng)首發(fā)于 CSDN，作者 TRHX。博客首頁：https://itrhx.blog.csdn.net/ 本文鏈接：https://itrhx.blog.csdn.net/article/details/106676693 未經(jīng)授權(quán)，禁止轉(zhuǎn)載！惡意轉(zhuǎn)載，后果自負(fù)！尊重原創(chuàng)，遠(yuǎn)離剽竊！

【01x00】了解 Pandas

Pandas 是 Python 的一個(gè)數(shù)據(jù)分析包，是基于 NumPy 構(gòu)建的，最初由 AQR Capital Management 于 2008 年 4 月開發(fā)，并于 2009 年底開源出來，目前由專注于 Python 數(shù)據(jù)包開發(fā)的 PyData 開發(fā)團(tuán)隊(duì)繼續(xù)開發(fā)和維護(hù)，屬于 PyData 項(xiàng)目的一部分。

Pandas 最初被作為金融數(shù)據(jù)分析工具而開發(fā)出來，因此，Pandas 為時(shí)間序列分析提供了很好的支持。Pandas 的名稱來自于面板數(shù)據(jù)（panel data）和 Python 數(shù)據(jù)分析（data analysis）。panel data 是經(jīng)濟(jì)學(xué)中關(guān)于多維數(shù)據(jù)集的一個(gè)術(shù)語，在 Pandas 中也提供了 panel 的數(shù)據(jù)類型。

Pandas 經(jīng)常和其它工具一同使用，如數(shù)值計(jì)算工具 NumPy 和 SciPy，分析庫 statsmodels 和 scikit-learn，數(shù)據(jù)可視化庫 Matplotlib 等，雖然 Pandas 采用了大量的 NumPy 編碼風(fēng)格，但二者最大的不同是 Pandas 是專門為處理表格和混雜數(shù)據(jù)設(shè)計(jì)的。而 NumPy 更適合處理統(tǒng)一的數(shù)值數(shù)組數(shù)據(jù)。

【以下對(duì) Pandas 的解釋翻譯自官方文檔：https://pandas.pydata.org/docs/getting_started/overview.html#package-overview】

Pandas 是 Python 的核心數(shù)據(jù)分析支持庫，提供了快速、靈活、明確的數(shù)據(jù)結(jié)構(gòu)，旨在簡單、直觀地處理關(guān)系型、標(biāo)記型數(shù)據(jù)。Pandas 的目標(biāo)是成為 Python 數(shù)據(jù)分析實(shí)踐與實(shí)戰(zhàn)的必備高級(jí)工具，其長遠(yuǎn)目標(biāo)是成為最強(qiáng)大、最靈活、可以支持任何語言的開源數(shù)據(jù)分析工具。經(jīng)過多年不懈的努力，Pandas 離這個(gè)目標(biāo)已經(jīng)越來越近了。

Pandas 適用于處理以下類型的數(shù)據(jù)：

與 SQL 或 Excel 表類似的，含異構(gòu)列的表格數(shù)據(jù);
有序和無序（非固定頻率）的時(shí)間序列數(shù)據(jù);
帶行列標(biāo)簽的矩陣數(shù)據(jù)，包括同構(gòu)或異構(gòu)型數(shù)據(jù);
任意其它形式的觀測、統(tǒng)計(jì)數(shù)據(jù)集, 數(shù)據(jù)轉(zhuǎn)入 Pandas 數(shù)據(jù)結(jié)構(gòu)時(shí)不必事先標(biāo)記。

Pandas 的主要數(shù)據(jù)結(jié)構(gòu)是 Series（一維數(shù)據(jù)）與 DataFrame（二維數(shù)據(jù)），這兩種數(shù)據(jù)結(jié)構(gòu)足以處理- 金融、統(tǒng)計(jì)、社會(huì)科學(xué)、工程等領(lǐng)域里的大多數(shù)典型用例。對(duì)于 R 語言用戶，DataFrame 提供了比 R 語言 data.frame 更豐富的功能。Pandas 基于 NumPy 開發(fā)，可以與其它第三方科學(xué)計(jì)算支持庫完美集成。

Pandas 就像一把萬能瑞士軍刀，下面僅列出了它的部分優(yōu)勢(shì) ：

處理浮點(diǎn)與非浮點(diǎn)數(shù)據(jù)里的缺失數(shù)據(jù)，表示為 NaN；
大小可變：插入或刪除 DataFrame 等多維對(duì)象的列；
自動(dòng)、顯式數(shù)據(jù)對(duì)齊：顯式地將對(duì)象與一組標(biāo)簽對(duì)齊，也可以忽略標(biāo)簽，在 Series、DataFrame 計(jì)算時(shí)自動(dòng)與數(shù)據(jù)對(duì)齊；
強(qiáng)大、靈活的分組（group by）功能：拆分-應(yīng)用-組合數(shù)據(jù)集，聚合、轉(zhuǎn)換數(shù)據(jù)；
把 Python 和 NumPy 數(shù)據(jù)結(jié)構(gòu)里不規(guī)則、不同索引的數(shù)據(jù)輕松地轉(zhuǎn)換為 DataFrame 對(duì)象；
基于智能標(biāo)簽，對(duì)大型數(shù)據(jù)集進(jìn)行切片、花式索引、子集分解等操作；
直觀地合并和連接數(shù)據(jù)集；
靈活地重塑和旋轉(zhuǎn)數(shù)據(jù)集；
軸支持分層標(biāo)簽（每個(gè)刻度可能有多個(gè)標(biāo)簽）；
強(qiáng)大的 IO 工具，讀取平面文件（CSV 等支持分隔符的文件）、Excel 文件、數(shù)據(jù)庫等來源的數(shù)據(jù)，以及從超快 HDF5 格式保存 / 加載數(shù)據(jù)；
時(shí)間序列：支持日期范圍生成、頻率轉(zhuǎn)換、移動(dòng)窗口統(tǒng)計(jì)、移動(dòng)窗口線性回歸、日期位移等時(shí)間序列功能。

這些功能主要是為了解決其它編程語言、科研環(huán)境的痛點(diǎn)。處理數(shù)據(jù)一般分為幾個(gè)階段：數(shù)據(jù)整理與清洗、數(shù)據(jù)分析與建模、數(shù)據(jù)可視化與制表，Pandas 是處理數(shù)據(jù)的理想工具。

其它說明：

Pandas 速度很快。Pandas 的很多底層算法都用 Cython 優(yōu)化過。然而，為了保持通用性，必然要犧牲一些性能，如果專注某一功能，完全可以開發(fā)出比 Pandas 更快的專用工具。
Pandas 是 statsmodels 的依賴項(xiàng)，因此，Pandas 也是 Python 中統(tǒng)計(jì)計(jì)算生態(tài)系統(tǒng)的重要組成部分。
Pandas 已廣泛應(yīng)用于金融領(lǐng)域。

【02x00】Pandas 數(shù)據(jù)結(jié)構(gòu)

Pandas 的主要數(shù)據(jù)結(jié)構(gòu)是 Series（帶標(biāo)簽的一維同構(gòu)數(shù)組）與 DataFrame（帶標(biāo)簽的，大小可變的二維異構(gòu)表格）。

Pandas 數(shù)據(jù)結(jié)構(gòu)就像是低維數(shù)據(jù)的容器。比如，DataFrame 是 Series 的容器，Series 則是標(biāo)量的容器。使用這種方式，可以在容器中以字典的形式插入或刪除對(duì)象。

此外，通用 API 函數(shù)的默認(rèn)操作要顧及時(shí)間序列與截面數(shù)據(jù)集的方向。當(dāng)使用 Ndarray 存儲(chǔ)二維或三維數(shù)據(jù)時(shí)，編寫函數(shù)要注意數(shù)據(jù)集的方向，這對(duì)用戶來說是一種負(fù)擔(dān)；如果不考慮 C 或 Fortran 中連續(xù)性對(duì)性能的影響，一般情況下，不同的軸在程序里其實(shí)沒有什么區(qū)別。Pandas 里，軸的概念主要是為了給數(shù)據(jù)賦予更直觀的語義，即用更恰當(dāng)?shù)姆绞奖硎緮?shù)據(jù)集的方向。這樣做可以讓用戶編寫數(shù)據(jù)轉(zhuǎn)換函數(shù)時(shí)，少費(fèi)點(diǎn)腦子。

處理 DataFrame 等表格數(shù)據(jù)時(shí)，對(duì)比 Numpy，index（行）或 columns（列）比 axis 0 和 axis 1 更直觀。用這種方式迭代 DataFrame 的列，代碼更易讀易懂：

for col in df.columns:series = df[col]# do something with series

【03x00】Series 對(duì)象

Series 是帶標(biāo)簽的一維數(shù)組，可存儲(chǔ)整數(shù)、浮點(diǎn)數(shù)、字符串、Python 對(duì)象等類型的數(shù)據(jù)。軸標(biāo)簽統(tǒng)稱為索引。調(diào)用 pandas.Series 函數(shù)即可創(chuàng)建 Series，基本語法如下：

pandas.Series(data=None[, index=None, dtype=None, name=None, copy=False, fastpath=False])

參數(shù)描述

data	數(shù)組類型，可迭代的，字典或標(biāo)量值，存儲(chǔ)在序列中的數(shù)據(jù)
index	索引（數(shù)據(jù)標(biāo)簽），值必須是可哈希的，并且具有與數(shù)據(jù)相同的長度，允許使用非唯一索引值。如果未提供，將默認(rèn)為RangeIndex（0，1，2，…，n）
dtype	輸出系列的數(shù)據(jù)類型。可選項(xiàng)，如果未指定，則將從數(shù)據(jù)中推斷，具體參考官網(wǎng) dtypes 介紹
name	str 類型，可選項(xiàng)，給 Series 命名
copy	bool 類型，可選項(xiàng)，默認(rèn) False，是否復(fù)制輸入數(shù)據(jù)

【03x01】通過 list 構(gòu)建 Series

一般情況下我們只會(huì)用到 data 和 index 參數(shù)，可以通過 list（列表）構(gòu)建 Series，示例如下：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2]) >>> obj 0 1 1 5 2 -8 3 2 dtype: int64

由于我們沒有為數(shù)據(jù)指定索引，于是會(huì)自動(dòng)創(chuàng)建一個(gè) 0 到 N-1（N 為數(shù)據(jù)的長度）的整數(shù)型索引，左邊一列是自動(dòng)創(chuàng)建的索引（index），右邊一列是數(shù)據(jù)（data）。

此外，還可以自定義索引（index）：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64

索引（index）也可以通過賦值的方式就地修改：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan'] >>> obj Bob 1 Steve 5 Jeff -8 Ryan 2 dtype: int64

【03x02】通過 dict 構(gòu)建 Series

通過字典（dict）構(gòu)建 Series，字典的鍵（key）會(huì)作為索引（index），字典的值（value）會(huì)作為數(shù)據(jù)（data），示例如下：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> obj = pd.Series(data) >>> obj Beijing 21530000 Shanghai 24280000 Wuhan 11210000 Zhejiang 58500000 dtype: int64

如果你想按照某個(gè)特定的順序輸出結(jié)果，可以傳入排好序的字典的鍵以改變順序：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> cities = ['Guangzhou', 'Wuhan', 'Zhejiang', 'Shanghai'] >>> obj = pd.Series(data, index=cities) >>> obj Guangzhou NaN Wuhan 11210000.0 Zhejiang 58500000.0 Shanghai 24280000.0 dtype: float64

注意：data 為字典，且未設(shè)置 index 參數(shù)時(shí)：

如果 Python >= 3.6 且 Pandas >= 0.23，Series 按字典的插入順序排序索引。
如果 Python < 3.6 或 Pandas < 0.23，Series 按字母順序排序索引。

【03x03】獲取其數(shù)據(jù)和索引

我們可以通過 Series 的 values 和 index 屬性獲取其數(shù)據(jù)和索引對(duì)象：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.values array([ 1, 5, -8, 2], dtype=int64) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object')

【03x04】通過索引獲取數(shù)據(jù)

與普通 NumPy 數(shù)組相比，Pandas 可以通過索引的方式選取 Series 中的單個(gè)或一組值，獲取一組值時(shí)，傳入的是一個(gè)列表，列表中的元素是索引值，另外還可以通過索引來修改其對(duì)應(yīng)的值：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj['a'] 1 >>> obj['a'] = 3 >>> obj[['a', 'b', 'c']] a 3 b 5 c -8 dtype: int64

【03x05】使用函數(shù)運(yùn)算

在 Pandas 中可以使用 NumPy 函數(shù)或類似 NumPy 的運(yùn)算（如根據(jù)布爾型數(shù)組進(jìn)行過濾、標(biāo)量乘法、應(yīng)用數(shù)學(xué)函數(shù)等）：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj[obj > 0] a 1 b 5 d 2 dtype: int64 >>> obj * 2 a 2 b 10 c -16 d 4 dtype: int64 >>> np.exp(obj) a 2.718282 b 148.413159 c 0.000335 d 7.389056 dtype: float64

除了這些運(yùn)算函數(shù)以外，還可以將 Series 看成是一個(gè)定長的有序字典，因?yàn)樗撬饕档綌?shù)據(jù)值的一個(gè)映射。它可以用在許多原本需要字典參數(shù)的函數(shù)中：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> 'a' in obj True >>> 'e' in obj False

和 NumPy 類似，Pandas 中也有 NaN（即非數(shù)字，not a number），在 Pandas 中，它用于表示缺失值，Pandas 的 isnull 和 notnull 函數(shù)可用于檢測缺失數(shù)據(jù)：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series([np.NaN, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a NaN b 5.0 c -8.0 d 2.0 dtype: float64 >>> pd.isnull(obj) a True b False c False d False dtype: bool >>> pd.notnull(obj) a False b True c True d True dtype: bool >>> obj.isnull() a True b False c False d False dtype: bool >>> obj.notnull() a False b True c True d True dtype: bool

【03x06】name 屬性

可以在 pandas.Series 方法中為 Series 對(duì)象指定一個(gè) name：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> obj = pd.Series(data, name='population') >>> obj Beijing 21530000 Shanghai 24280000 Wuhan 11210000 Zhejiang 58500000 Name: population, dtype: int64

也可以通過 name 和 index.name 屬性為 Series 對(duì)象和其索引指定 name：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> obj = pd.Series(data) >>> obj.name = 'population' >>> obj.index.name = 'cities' >>> obj cities Beijing 21530000 Shanghai 24280000 Wuhan 11210000 Zhejiang 58500000 Name: population, dtype: int64

【04x00】DataFrame 對(duì)象

DataFrame 是一個(gè)表格型的數(shù)據(jù)結(jié)構(gòu)，它含有一組有序的列，每列可以是不同的值類型（數(shù)值、字符串、布爾值等）。DataFrame 既有行索引也有列索引，它可以被看做由 Series 組成的字典（共用同一個(gè)索引）。DataFrame 中的數(shù)據(jù)是以一個(gè)或多個(gè)二維塊存放的（而不是列表、字典或別的一維數(shù)據(jù)結(jié)構(gòu)）。

類似多維數(shù)組/表格數(shù)據(jù) (如Excel、R 語言中的 data.frame)；
每列數(shù)據(jù)可以是不同的類型；
索引包括列索引和行索引

基本語法如下：

pandas.DataFrame(data=None, index: Optional[Collection] = None, columns: Optional[Collection] = None, dtype: Union[str, numpy.dtype, ExtensionDtype, None] = None, copy: bool = False)

參數(shù)描述

data	ndarray 對(duì)象（結(jié)構(gòu)化或同類的）、可迭代的或者字典形式，存儲(chǔ)在序列中的數(shù)據(jù)
index	數(shù)組類型，索引（數(shù)據(jù)標(biāo)簽），如果未提供，將默認(rèn)為 RangeIndex（0，1，2，…，n）
columns	列標(biāo)簽。如果未提供，則將默認(rèn)為 RangeIndex（0、1、2、…、n）
dtype	輸出系列的數(shù)據(jù)類型。可選項(xiàng)，如果未指定，則將從數(shù)據(jù)中推斷，具體參考官網(wǎng) dtypes 介紹
copy	bool 類型，可選項(xiàng)，默認(rèn) False，是否復(fù)制輸入數(shù)據(jù)，僅影響 DataFrame/2d ndarray 輸入

【03x01】通過 ndarray 構(gòu)建 DataFrame

>>> import numpy as np >>> import pandas as pd >>> data = np.random.randn(5,3) >>> data array([[-2.16231157, 0.44967198, -0.73131523],[ 1.18982913, 0.94670798, 0.82973421],[-1.57680831, -0.99732066, 0.96432 ],[-0.77483149, -1.23802881, 0.44061227],[ 1.77666419, 0.24931983, -1.12960153]]) >>> obj = pd.DataFrame(data) >>> obj0 1 2 0 -2.162312 0.449672 -0.731315 1 1.189829 0.946708 0.829734 2 -1.576808 -0.997321 0.964320 3 -0.774831 -1.238029 0.440612 4 1.776664 0.249320 -1.129602

指定索引（index）和列標(biāo)簽（columns），和 Series 對(duì)象類似，可以在構(gòu)建的時(shí)候添加索引和標(biāo)簽，也可以直接通過賦值的方式就地修改：

>>> import numpy as np >>> import pandas as pd >>> data = np.random.randn(5,3) >>> index = ['a', 'b', 'c', 'd', 'e'] >>> columns = ['A', 'B', 'C'] >>> obj = pd.DataFrame(data, index, columns) >>> objA B C a -1.042909 -0.238236 -1.050308 b 0.587079 0.739683 -0.233624 c -0.451254 -0.638496 1.708807 d -0.620158 -1.875929 -0.432382 e -1.093815 0.396965 -0.759479 >>> >>> obj.index = ['A1', 'A2', 'A3', 'A4', 'A5'] >>> obj.columns = ['B1', 'B2', 'B3'] >>> objB1 B2 B3 A1 -1.042909 -0.238236 -1.050308 A2 0.587079 0.739683 -0.233624 A3 -0.451254 -0.638496 1.708807 A4 -0.620158 -1.875929 -0.432382 A5 -1.093815 0.396965 -0.759479

【03x02】通過 dict 構(gòu)建 DataFrame

通過字典（dict）構(gòu)建 DataFrame，字典的鍵（key）會(huì)作為列標(biāo)簽（columns），字典的值（value）會(huì)作為數(shù)據(jù)（data），示例如下：

如果指定了列序列，則 DataFrame 的列就會(huì)按照指定順序進(jìn)行排列，如果傳入的列在數(shù)據(jù)中找不到，就會(huì)在結(jié)果中產(chǎn)生缺失值（NaN）：

>>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> pd.DataFrame(data)city year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000 >>> pd.DataFrame(data, columns=['year', 'city', 'people'])year city people 0 2017 Wuhan 10892900 1 2018 Wuhan 11081000 2 2019 Wuhan 11212000 3 2017 Beijing 21707000 4 2018 Beijing 21542000 5 2019 Beijing 21536000 >>> pd.DataFrame(data, columns=['year', 'city', 'people', 'money'])year city people money 0 2017 Wuhan 10892900 NaN 1 2018 Wuhan 11081000 NaN 2 2019 Wuhan 11212000 NaN 3 2017 Beijing 21707000 NaN 4 2018 Beijing 21542000 NaN 5 2019 Beijing 21536000 NaN

注意：data 為字典，且未設(shè)置 columns 參數(shù)時(shí)：

Python > = 3.6 且 Pandas > = 0.23，DataFrame 的列按字典的插入順序排序。
Python < 3.6 或 Pandas < 0.23，DataFrame 的列按字典鍵的字母排序。

【03x03】獲取其數(shù)據(jù)和索引

和 Series 一樣，DataFrame 也可以通過其 values 和 index 屬性獲取其數(shù)據(jù)和索引對(duì)象：

【03x04】通過索引獲取數(shù)據(jù)

通過類似字典標(biāo)記的方式或?qū)傩缘姆绞?#xff0c;可以將 DataFrame 的列獲取為一個(gè) Series 對(duì)象；

行也可以通過位置或名稱的方式進(jìn)行獲取，比如用 loc 屬性；

對(duì)于特別大的 DataFrame，有一個(gè) head 方法可以選取前五行數(shù)據(jù)。

用法示例：

>>> import numpy as np >>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> obj = pd.DataFrame(data) >>> objcity year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000 >>> >>> obj['city'] 0 Wuhan 1 Wuhan 2 Wuhan 3 Beijing 4 Beijing 5 Beijing Name: city, dtype: object >>> >>> obj.year 0 2017 1 2018 2 2019 3 2017 4 2018 5 2019 Name: year, dtype: int64 >>> >>> type(obj.year) <class 'pandas.core.series.Series'> >>> >>> obj.loc[2] city Wuhan year 2019 people 11212000 Name: 2, dtype: object >>> >>> obj.head()city year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000

【03x05】修改列的值

列可以通過賦值的方式進(jìn)行修改。在下面示例中，分別給"money"列賦上一個(gè)標(biāo)量值和一組值：

>>> import pandas as pd >>> import numpy as np >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000],'money':[np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN]} >>> obj = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E', 'F']) >>> objcity year people money A Wuhan 2017 10892900 NaN B Wuhan 2018 11081000 NaN C Wuhan 2019 11212000 NaN D Beijing 2017 21707000 NaN E Beijing 2018 21542000 NaN F Beijing 2019 21536000 NaN >>> >>> obj['money'] = 6666666666 >>> objcity year people money A Wuhan 2017 10892900 6666666666 B Wuhan 2018 11081000 6666666666 C Wuhan 2019 11212000 6666666666 D Beijing 2017 21707000 6666666666 E Beijing 2018 21542000 6666666666 F Beijing 2019 21536000 6666666666 >>> >>> obj['money'] = np.arange(100000000, 700000000, 100000000) >>> objcity year people money A Wuhan 2017 10892900 100000000 B Wuhan 2018 11081000 200000000 C Wuhan 2019 11212000 300000000 D Beijing 2017 21707000 400000000 E Beijing 2018 21542000 500000000 F Beijing 2019 21536000 600000000

將列表或數(shù)組賦值給某個(gè)列時(shí)，其長度必須跟 DataFrame 的長度相匹配。如果賦值的是一個(gè) Series，就會(huì)精確匹配 DataFrame 的索引：

>>> import pandas as pd >>> import numpy as np >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000],'money':[np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN]} >>> obj = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E', 'F']) >>> objcity year people money A Wuhan 2017 10892900 NaN B Wuhan 2018 11081000 NaN C Wuhan 2019 11212000 NaN D Beijing 2017 21707000 NaN E Beijing 2018 21542000 NaN F Beijing 2019 21536000 NaN >>> >>> new_data = pd.Series([5670000000, 6890000000, 7890000000], index=['A', 'C', 'E']) >>> obj['money'] = new_data >>> objcity year people money A Wuhan 2017 10892900 5.670000e+09 B Wuhan 2018 11081000 NaN C Wuhan 2019 11212000 6.890000e+09 D Beijing 2017 21707000 NaN E Beijing 2018 21542000 7.890000e+09 F Beijing 2019 21536000 NaN

【03x06】增加 / 刪除列

為不存在的列賦值會(huì)創(chuàng)建出一個(gè)新列，關(guān)鍵字 del 用于刪除列：

>>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> obj = pd.DataFrame(data) >>> objcity year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000 >>> >>> obj['northern'] = obj['city'] == 'Beijing' >>> objcity year people northern 0 Wuhan 2017 10892900 False 1 Wuhan 2018 11081000 False 2 Wuhan 2019 11212000 False 3 Beijing 2017 21707000 True 4 Beijing 2018 21542000 True 5 Beijing 2019 21536000 True >>> >>> del obj['northern'] >>> objcity year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000

【03x07】name 屬性

可以通過 index.name 和 columns.name 屬性設(shè)置索引（index）和列標(biāo)簽（columns）的 name，注意 DataFrame 對(duì)象是沒有 name 屬性的：

>>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> obj = pd.DataFrame(data) >>> obj.index.name = 'index' >>> obj.columns.name = 'columns' >>> obj columns city year people index 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯，堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)

總結(jié)

以上是生活随笔為你收集整理的Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：浦发万用随借金怎么转他行浦发万用随借金
下一篇：【Python CheckiO 题解】T

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象

文章目錄

【01x00】了解 Pandas

【02x00】Pandas 數(shù)據(jù)結(jié)構(gòu)

【03x00】Series 對(duì)象

【03x01】通過 list 構(gòu)建 Series

【03x02】通過 dict 構(gòu)建 Series

【03x03】獲取其數(shù)據(jù)和索引

【03x04】通過索引獲取數(shù)據(jù)

【03x05】使用函數(shù)運(yùn)算

【03x06】name 屬性

【04x00】DataFrame 對(duì)象

【03x01】通過 ndarray 構(gòu)建 DataFrame

【03x02】通過 dict 構(gòu)建 DataFrame

【03x03】獲取其數(shù)據(jù)和索引

【03x04】通過索引獲取數(shù)據(jù)

【03x05】修改列的值

【03x06】增加 / 刪除列

【03x07】name 屬性

總結(jié)