當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

50个最有用的Matplotlib数据分析与可视化图

發(fā)布時(shí)間：2023/12/10 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 50个最有用的Matplotlib数据分析与可视化图小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

本文介紹了數(shù)據(jù)分析與可視化中最有用的50個(gè)數(shù)據(jù)分析圖，共分為7大類：Correlation、Deviation、RankIng、Distribution、Composition、Change、Groups

原文鏈接：https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/

設(shè)置

Correlation

1.Scatter plot（散點(diǎn)圖）

2.Bubble plot with Encircling（包圍的氣泡圖）

3. Scatter plot with linear regression line of best fit（散點(diǎn)圖與最佳線性擬合回歸線）

4.Jittering with stripplot（帶條紋的抖動(dòng)）

5. Counts Plot（計(jì)數(shù)圖）

6. Marginal Histogram（邊緣直方圖）

7. Marginal Boxplot（邊緣箱線圖）

8. Correllogram（相關(guān)圖）

9. Pairwise Plot（成對圖）

Deviation

10. Diverging Bars（發(fā)散條形圖）

11. Diverging Texts（發(fā)散文本）

12. Diverging Dot Plot（散點(diǎn)圖）

13. Diverging Lollipop Chart with Markers（帶標(biāo)記的發(fā)散型棒棒糖圖）

14. Area Chart（面積圖）

Ranking

15. Ordered Bar Chart（有序條形圖）

16. Lollipop Chart（棒棒糖圖）

17. Dot Plot（點(diǎn)圖）

18. Slope Chart（坡度圖）

19. Dumbbell Plot（啞鈴圖）

Distribution

20. Histogram for Continuous Variable（連續(xù)變量的直方圖）

21. Histogram for Categorical Variable（類型變量的直方圖）

22. Density Plot（密度圖）

23. Density Curves with Histogram（直方密度圖）

24. Joy Plot

25. Distributed Dot Plot（分布式點(diǎn)圖）

26. Box Plot（箱形圖）

27. Dot + Box Plot（點(diǎn)+箱型圖）

28. Violin Plot（小提琴圖）

29. Population Pyramid（人口金字塔）

30. Categorical Plots（分類圖）

Composition

31. Waffle Chart（華夫餅表）

32. Pie Chart（餅狀圖）

33. Treemap（樹狀圖）

34. Bar Chart（條形圖）

Change

35. Time Series Plot（時(shí)間序列圖）

36. Time Series with Peaks and Troughs Annotated（帶波峰波谷標(biāo)記的時(shí)序圖）

37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot（自相關(guān)和部分自相關(guān)圖）

38. Cross Correlation plot（交叉相關(guān)圖）

39. Time Series Decomposition Plot（時(shí)間序列分解圖）

40. Multiple Time Series（多時(shí)間序列）

41. Plotting with different scales using secondary Y axis（使用輔助Y軸來繪制不同范圍的圖形）

42. Time Series with Error Bands（帶有誤差帶的時(shí)間序列）

43. Stacked Area Chart（堆積面積圖）

44. Area Chart UnStacked（未堆積的面積圖）

45. Calendar Heat Map（日歷熱力圖）

46. Seasonal Plot（季度圖）

Groups

47. Dendrogram（樹狀圖）

48. Cluster Plot（簇狀圖）

49. Andrews Curve（安德魯斯曲線）

50. Parallel Coordinates（平行坐標(biāo)）

設(shè)置

在運(yùn)行具體畫圖代碼前，先運(yùn)行以下代碼，導(dǎo)入畫圖所需的庫，以及進(jìn)行一些必要的參數(shù)設(shè)置。

import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns import warnings; warnings.filterwarnings(action='once')large = 22; med = 16; small = 12 params = {'axes.titlesize': large,'legend.fontsize': med,'figure.figsize': (16, 10),'axes.labelsize': med,'axes.titlesize': med,'xtick.labelsize': med,'ytick.labelsize': med,'figure.titlesize': large} plt.rcParams.update(params) plt.style.use('seaborn-whitegrid') sns.set_style("white") %matplotlib inline# Version print(mpl.__version__) #> 3.0.0 print(sns.__version__) #> 0.9.0

Correlation

用于可視化兩個(gè)或多個(gè)變量之間的相互關(guān)系，當(dāng)一個(gè)變量發(fā)生變化時(shí)，另一個(gè)變量與之如何變化。

1.Scatter plot（散點(diǎn)圖）

散點(diǎn)圖是研究兩個(gè)變量之間最基本和經(jīng)典的關(guān)系圖。如果數(shù)據(jù)中有多個(gè)不同的組，則可能需要以不同的顏色顯示每個(gè)組。在matplotlib中，可以使用plt.scatterplot()。

# Import dataset midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")# Prepare Data # Create as many colors as there are unique midwest['category'] categories = np.unique(midwest['category']) colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]# Draw Plot for Each Category plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')for i, category in enumerate(categories):plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s=20, c=colors[i], label=str(category))# Decorations plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),xlabel='Area', ylabel='Population')plt.xticks(fontsize=12); plt.yticks(fontsize=12) plt.title("Scatterplot of Midwest Area vs Population", fontsize=22) plt.legend(fontsize=12) plt.show()

2.Bubble plot with Encircling（包圍的氣泡圖）

有時(shí)你想在一個(gè)邊界內(nèi)顯示一組點(diǎn)來強(qiáng)調(diào)它們的重要性。

from matplotlib import patches from scipy.spatial import ConvexHull import warnings; warnings.simplefilter('ignore') sns.set_style("white")# Step 1: Prepare Data midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")# As many colors as there are unique midwest['category'] categories = np.unique(midwest['category']) colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]# Step 2: Draw Scatterplot with unique color for each category fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k') for i, category in enumerate(categories):plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], \ s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)# Step 3: Encircling # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot def encircle(x,y, ax=None, **kw):if not ax: ax=plt.gca()p = np.c_[x,y]hull = ConvexHull(p)poly = plt.Polygon(p[hull.vertices,:], **kw)ax.add_patch(poly)# Select data to be encircled midwest_encircle_data = midwest.loc[midwest.state=='IN', :] # Draw polygon surrounding vertices encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1) encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)# Step 4: Decorations plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),xlabel='Area', ylabel='Population')plt.xticks(fontsize=12); plt.yticks(fontsize=12) plt.title("Bubble Plot with Encircling", fontsize=22) plt.legend(fontsize=12) plt.show()

3. Scatter plot with linear regression line of best fit（散點(diǎn)圖與最佳線性擬合回歸線）

如果想了解兩個(gè)變量之間是如何變化的，那么最好的方法就是繪制一條擬合線。下圖顯示了數(shù)據(jù)中不同組之間的最佳擬合線的差異。

# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") df_select = df.loc[df.cyl.isin([4,8]), :]# Plot sns.set_style("white") gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select, height=7, aspect=1.6, robust=True, palette='tab10', scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))# Decorations gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50)) plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20) plt.show()

4.Jittering with stripplot（帶條紋的抖動(dòng)）

通常，多個(gè)數(shù)據(jù)點(diǎn)具有完全相同的X和Y值。因此繪制這些點(diǎn)時(shí)會(huì)相互覆蓋，為避免這種情況，可以對其稍微抖動(dòng)，以便可以直觀地看到它們。使用seaborn.stripplot()函數(shù)很方便。

# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")# Draw Stripplot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)# Decorations plt.title('Use jittered plots to avoid overlapping of points', fontsize=22) plt.show()

5. Counts Plot（計(jì)數(shù)圖）

另一個(gè)避免數(shù)據(jù)點(diǎn)相互重疊的方法是改變數(shù)據(jù)點(diǎn)的大小，這取決于該圖中有多少個(gè)點(diǎn)。

# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts')# Draw Stripplot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)# Decorations plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22) plt.show()

6. Marginal Histogram（邊緣直方圖）

邊緣直方圖是一個(gè)有著沿X和Y軸變量的直方圖。這用于可視化X和Y之間的關(guān)系，同時(shí)也顯示出X和Y各自的分布情況。

# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")# Create Fig and gridspec fig = plt.figure(figsize=(16, 10), dpi= 80) grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)# Define the axes ax_main = fig.add_subplot(grid[:-1, :-1]) ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[]) ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])# Scatterplot on main ax ax_main.scatter('displ', 'hwy', s=df.cty*4, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, \ cmap="tab10", edgecolors='gray', linewidths=.5)# histogram on the right ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink') ax_bottom.invert_yaxis()# histogram in the bottom ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink')# Decorations ax_main.set(title='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy') ax_main.title.set_fontsize(20) for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):item.set_fontsize(14)xlabels = ax_main.get_xticks().tolist() ax_main.set_xticklabels(xlabels) plt.show()

7. Marginal Boxplot（邊緣箱線圖）

邊緣箱圖與邊緣直方圖具有相似的用途。然而，箱線圖有助于精確定位X和Y的中位數(shù)，25和75百分位數(shù)。

# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")# Create Fig and gridspec fig = plt.figure(figsize=(16, 10), dpi= 80) grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)# Define the axes ax_main = fig.add_subplot(grid[:-1, :-1]) ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[]) ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])# Scatterplot on main ax ax_main.scatter('displ', 'hwy', s=df.cty*5, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5)# Add a graph in each part sns.boxplot(df.hwy, ax=ax_right, orient="v") sns.boxplot(df.displ, ax=ax_bottom, orient="h")# Decorations ------------------ # Remove x axis name for the boxplot ax_bottom.set(xlabel='') ax_right.set(ylabel='')# Main Title, Xlabel and YLabel ax_main.set(title='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy')# Set font size of different components ax_main.title.set_fontsize(20) for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):item.set_fontsize(14)plt.show()

8. Correllogram（相關(guān)圖）

Correlogram用于直觀地查看給定數(shù)據(jù)幀（或2D數(shù)組）中所有可能的數(shù)值變量對之間的相關(guān)度量。

# Import Dataset df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")# Plot plt.figure(figsize=(12,10), dpi= 80) sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns,cmap='RdYlGn', center=0, annot=True)# Decorations plt.title('Correlogram of mtcars', fontsize=22) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.show()

9. Pairwise Plot（成對圖）

成對圖用以理解所有可能的數(shù)字變量對之間的關(guān)系，它是雙變量分析的必備工具。

# Load Dataset df = sns.load_dataset('iris')# Plot plt.figure(figsize=(10,8), dpi= 80) sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5)) plt.show()

# Load Dataset df = sns.load_dataset('iris')# Plot plt.figure(figsize=(10,8), dpi= 80) sns.pairplot(df, kind="reg", hue="species") plt.show()

Deviation

10. Diverging Bars（發(fā)散條形圖）

如果想根據(jù)單個(gè)指標(biāo)查看條目的變化情況，并可視化此差異的順序和數(shù)量，那么發(fā)散條形圖是一個(gè)很好的工具。

# Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)# Draw plot plt.figure(figsize=(14,10), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5)# Decorations plt.gca().set(ylabel='$Model$', xlabel='$Mileage$') plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show()

11. Diverging Texts（發(fā)散文本）

發(fā)散文本類似于發(fā)散條形圖，如果想以一種漂亮和可呈現(xiàn)的方式顯示圖表中每個(gè)條目的數(shù)值。

# Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)# Draw plot plt.figure(figsize=(14,14), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left', verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14})# Decorations plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show()

12. Diverging Dot Plot（散點(diǎn)圖）

# Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)# Draw plot plt.figure(figsize=(14,16), dpi= 80) plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):t = plt.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center', fontdict={'color':'white'})# Decorations # Lighten borders plt.gca().spines["top"].set_alpha(.3) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.3)plt.yticks(df.index, df.cars) plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20}) plt.xlabel('$Mileage$') plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show()

13. Diverging Lollipop Chart with Markers（帶標(biāo)記的發(fā)散型棒棒糖圖）

帶標(biāo)記的棒棒糖可以幫助強(qiáng)調(diào)想要引起注意的任何重要數(shù)據(jù)點(diǎn)。

# Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = 'black'# color fiat differently df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange' df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)# Draw plot import matplotlib.patches as patchesplt.figure(figsize=(14,16), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1) plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-9' else 300 for x in df.cars], alpha=0.6) plt.yticks(df.index, df.cars) plt.xticks(fontsize=12)# Annotate plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data', fontsize=15, ha='center', va='center',bbox=dict(boxstyle='square', fc='firebrick'),arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white')# Add Patches p1 = patches.Rectangle((-2.0, -1), width=.3, height=3, alpha=.2, facecolor='red') p2 = patches.Rectangle((1.5, 27), width=.8, height=5, alpha=.2, facecolor='green') plt.gca().add_patch(p1) plt.gca().add_patch(p2)# Decorate plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show()

14. Area Chart（面積圖）

通過對坐標(biāo)軸和曲線之間的區(qū)域進(jìn)行著色，區(qū)域圖不僅強(qiáng)調(diào)波峰值和低波，而且還強(qiáng)調(diào)波峰和波谷的持續(xù)時(shí)間。波峰持續(xù)時(shí)間越長，面積越大。

import numpy as np import pandas as pd# Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100) x = np.arange(df.shape[0]) y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100# Plot plt.figure(figsize=(16,10), dpi= 80) plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7) plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] <= 0, facecolor='red', interpolate=True, alpha=0.7)# Annotate plt.annotate('Peak \n1975', xy=(94.0, 21.0), xytext=(88.0, 28),bbox=dict(boxstyle='square', fc='firebrick'),arrowprops=dict(facecolor='steelblue', shrink=0.05), fontsize=15, color='white')# Decorations xtickvals = [str(m)[:3].upper()+"-"+str(y) for y,m in zip(df.date.dt.year, df.date.dt.month_name())] plt.gca().set_xticks(x[::6]) plt.gca().set_xticklabels(xtickvals[::6], rotation=90, fontdict={'horizontalalignment': 'center', 'verticalalignment': 'center_baseline'}) plt.ylim(-35,35) plt.xlim(1,100) plt.title("Month Economics Return %", fontsize=22) plt.ylabel('Monthly returns %') plt.grid(alpha=0.5) plt.show()

Ranking

15. Ordered Bar Chart（有序條形圖）

有序條形圖有效地傳達(dá)了條目的排名順序。但是，在圖表上方添加度量標(biāo)準(zhǔn)的值，用戶可以從圖表本身獲取精確信息。

# Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True)# Draw plot import matplotlib.patches as patchesfig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80) ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20)# Annotate Text for i, cty in enumerate(df.cty):ax.text(i, cty+0.5, round(cty, 1), horizontalalignment='center')# Title, Label, Ticks and Ylim ax.set_title('Bar Chart for Highway Mileage', fontdict={'size':22}) ax.set(ylabel='Miles Per Gallon', ylim=(0, 30)) plt.xticks(df.index, df.manufacturer.str.upper(), rotation=60, horizontalalignment='right', fontsize=12)# Add patches to color the X axis labels p1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='green', transform=fig.transFigure) p2 = patches.Rectangle((.124, -0.005), width=.446, height=.13, alpha=.1, facecolor='red', transform=fig.transFigure) fig.add_artist(p1) fig.add_artist(p2) plt.show()

16. Lollipop Chart（棒棒糖圖）

棒棒糖圖表以一種視覺上令人愉悅的方式提供與有序條形圖類似的目的。

# Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True)# Draw plot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=2) ax.scatter(x=df.index, y=df.cty, s=75, color='firebrick', alpha=0.7)# Title, Label, Ticks and Ylim ax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22}) ax.set_ylabel('Miles Per Gallon') ax.set_xticks(df.index) ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12}) ax.set_ylim(0, 30)# Annotate for row in df.itertuples():ax.text(row.Index, row.cty+.5, s=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14)plt.show()

17. Dot Plot（點(diǎn)圖）

點(diǎn)圖傳達(dá)了條目的排名順序。由于它沿水平軸對齊，因此可以更容易地看到點(diǎn)彼此之間的距離。

18. Slope Chart（坡度圖）

坡度圖最適合比較給定項(xiàng)目的“在此之前”和“在此之后”的位置。

import matplotlib.lines as mlines # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")left_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1952'])] right_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1957'])] klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])]# draw line # https://stackoverflow.com/questions/36470343/how-to-draw-a-line-with-matplotlib/36479941 def newline(p1, p2, color='black'):ax = plt.gca()l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o', markersize=6)ax.add_line(l)return lfig, ax = plt.subplots(1,1,figsize=(14,14), dpi= 80)# Vertical Lines ax.vlines(x=1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted') ax.vlines(x=3, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')# Points ax.scatter(y=df['1952'], x=np.repeat(1, df.shape[0]), s=10, color='black', alpha=0.7) ax.scatter(y=df['1957'], x=np.repeat(3, df.shape[0]), s=10, color='black', alpha=0.7)# Line Segmentsand Annotation for p1, p2, c in zip(df['1952'], df['1957'], df['continent']):newline([1,p1], [3,p2])ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14})ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14})# 'Before' and 'After' Annotations ax.text(1-0.05, 13000, 'BEFORE', horizontalalignment='right', verticalalignment='center', fontdict={'size':18, 'weight':700}) ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700})# Decoration ax.set_title("Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={'size':22}) ax.set(xlim=(0,4), ylim=(0,14000), ylabel='Mean GDP Per Capita') ax.set_xticks([1,3]) ax.set_xticklabels(["1952", "1957"]) plt.yticks(np.arange(500, 13000, 2000), fontsize=12)# Lighten borders plt.gca().spines["top"].set_alpha(.0) plt.gca().spines["bottom"].set_alpha(.0) plt.gca().spines["right"].set_alpha(.0) plt.gca().spines["left"].set_alpha(.0) plt.show()

19. Dumbbell Plot（啞鈴圖）

啞鈴圖傳達(dá)各種項(xiàng)目的“前”和“后”位置以及項(xiàng)目的排序。

import matplotlib.lines as mlines# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/health.csv") df.sort_values('pct_2014', inplace=True) df.reset_index(inplace=True)# Func to draw line segment def newline(p1, p2, color='black'):ax = plt.gca()l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue')ax.add_line(l)return l# Figure and Axes fig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7', dpi= 80)# Vertical Lines ax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')# Points ax.scatter(y=df['index'], x=df['pct_2013'], s=50, color='#0e668b', alpha=0.7) ax.scatter(y=df['index'], x=df['pct_2014'], s=50, color='#a3c4dc', alpha=0.7)# Line Segments for i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']):newline([p1, i], [p2, i])# Decoration ax.set_facecolor('#f7f7f7') ax.set_title("Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={'size':22}) ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Mean GDP Per Capita') ax.set_xticks([.05, .1, .15, .20]) ax.set_xticklabels(['5%', '15%', '20%', '25%']) ax.set_xticklabels(['5%', '15%', '20%', '25%']) plt.show()

Distribution

20. Histogram for Continuous Variable（連續(xù)變量的直方圖）

直方圖顯示給定變量的頻率分布。下面表示基于分類變量對頻率條進(jìn)行分組，從而更好地了解連續(xù)變量和串聯(lián)變量。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare data x_var = 'displ' groupby_var = 'class' df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var) vals = [df[x_var].values.tolist() for i, df in df_agg]# Draw plt.figure(figsize=(16,9), dpi= 80) colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))] n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)])# Decoration plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])}) plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22) plt.xlabel(x_var) plt.ylabel("Frequency") plt.ylim(0, 25) plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]]) plt.show()

21. Histogram for Categorical Variable（類型變量的直方圖）

分類變量的直方圖顯示該變量的頻率分布。通過對條形圖進(jìn)行著色，您可以將分布與表示顏色的另一個(gè)分類變量相關(guān)聯(lián)。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare data x_var = 'manufacturer' groupby_var = 'class' df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var) vals = [df[x_var].values.tolist() for i, df in df_agg]# Draw plt.figure(figsize=(16,9), dpi= 80) colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))] n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)])# Decoration plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])}) plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22) plt.xlabel(x_var) plt.ylabel("Frequency") plt.ylim(0, 40) plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left') plt.show()

22. Density Plot（密度圖）

密度圖是一種常用工具，可視化連續(xù)變量的分布。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot plt.figure(figsize=(16,10), dpi= 80) sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.7)# Decoration plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22) plt.legend() plt.show()

23. Density Curves with Histogram（直方密度圖）

帶有直方圖的密度曲線將兩個(gè)圖表傳達(dá)的集體信息匯集在一起，這樣您就可以將它們放在一個(gè)圖形而不是兩個(gè)圖形中。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot plt.figure(figsize=(13,10), dpi= 80) sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact", hist_kws={'alpha':.7}, kde_kws={'linewidth':3}) sns.distplot(df.loc[df['class'] == 'suv', "cty"], color="orange", label="SUV", hist_kws={'alpha':.7}, kde_kws={'linewidth':3}) sns.distplot(df.loc[df['class'] == 'minivan', "cty"], color="g", label="minivan", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})plt.ylim(0, 0.35)# Decoration plt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22) plt.legend() plt.show()

24. Joy Plot

Joy Plot允許不同組的密度曲線重疊，這是一種可視化相對于彼此的大量組的分布的好方法。它看起來很悅目，并清楚地傳達(dá)了正確的信息。它可以使用joypy包來輕松構(gòu)建。

# !pip install joypy # Import Data mpg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot plt.figure(figsize=(16,10), dpi= 80) fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='own', figsize=(14,10))# Decoration plt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22) plt.show()

25. Distributed Dot Plot（分布式點(diǎn)圖）

分布點(diǎn)圖顯示按組分割的點(diǎn)的單變量分布。

import matplotlib.patches as mpatches# Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") cyl_colors = {4:'tab:red', 5:'tab:green', 6:'tab:blue', 8:'tab:orange'} df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors)# Mean and Median city mileage by make df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', ascending=False, inplace=True) df.reset_index(inplace=True) df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median())# Draw horizontal lines fig, ax = plt.subplots(figsize=(16,10), dpi= 80) ax.hlines(y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5, linewidth=.5, linestyles='dashdot')# Draw the Dots for i, make in enumerate(df.manufacturer):df_make = df_raw.loc[df_raw.manufacturer==make, :]ax.scatter(y=np.repeat(i, df_make.shape[0]), x='cty', data=df_make, s=75, edgecolors='gray', c='w', alpha=0.5)ax.scatter(y=i, x='cty', data=df_median.loc[df_median.index==make, :], s=75, c='firebrick')# Annotate ax.text(33, 13, "$red \; dots \; are \; the \: median$", fontdict={'size':12}, color='firebrick')# Decorations red_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, color='firebrick', label="Median") plt.legend(handles=red_patch) ax.set_title('Distribution of City Mileage by Make', fontdict={'size':22}) ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7) ax.set_yticks(df.index) ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, alpha=0.7) ax.set_xlim(1, 40) plt.xticks(alpha=0.7) plt.gca().spines["top"].set_visible(False) plt.gca().spines["bottom"].set_visible(False) plt.gca().spines["right"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis='both', alpha=.4, linewidth=.1) plt.show()

26. Box Plot（箱形圖）

箱形圖是一種可視化分布的好方法，記住中位數(shù)，第25個(gè)第75個(gè)四分位數(shù)和異常值。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot plt.figure(figsize=(13,10), dpi= 80) sns.boxplot(x='class', y='hwy', data=df, notch=False)# Add N Obs inside boxplot (optional) def add_n_obs(df,group_col,y):medians_dict = {grp[0]:grp[1][y].median() for grp in df.groupby(group_col)}xticklabels = [x.get_text() for x in plt.gca().get_xticklabels()]n_obs = df.groupby(group_col)[y].size().valuesfor (x, xticklabel), n_ob in zip(enumerate(xticklabels), n_obs):plt.text(x, medians_dict[xticklabel]*1.01, "#obs : "+str(n_ob), horizontalalignment='center', fontdict={'size':14}, color='white')add_n_obs(df,group_col='class',y='hwy') # Decoration plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22) plt.ylim(10, 40) plt.show()

27. Dot + Box Plot（點(diǎn)+箱型圖）

Dot + Box plot傳送類似于分組的boxplot信息。此外，這些點(diǎn)給出了每組中有多少數(shù)據(jù)點(diǎn)。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot plt.figure(figsize=(13,10), dpi= 80) sns.boxplot(x='class', y='hwy', data=df, hue='cyl') sns.stripplot(x='class', y='hwy', data=df, color='black', size=3, jitter=1)for i in range(len(df['class'].unique())-1):plt.vlines(i+.5, 10, 45, linestyles='solid', colors='gray', alpha=0.2)# Decoration plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22) plt.legend(title='Cylinders') plt.show()

28. Violin Plot（小提琴圖）

小提琴圖是箱形圖的視覺上令人愉悅的替代品。小提琴的形狀或面積取決于它所持有的點(diǎn)數(shù)。然而，小提琴圖可能更難以理解，并且在專業(yè)設(shè)置中不常用。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot plt.figure(figsize=(13,10), dpi= 80) sns.violinplot(x='class', y='hwy', data=df, scale='width', inner='quartile')# Decoration plt.title('Violin Plot of Highway Mileage by Vehicle Class', fontsize=22) plt.show()

29. Population Pyramid（人口金字塔）

人口金字塔可用于顯示由volumne排序的組的分布。

# Read data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv")# Draw Plot plt.figure(figsize=(13,10), dpi= 80) group_col = 'Gender' order_of_bars = df.Stage.unique()[::-1] colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]for c, group in zip(colors, df[group_col].unique()):sns.barplot(x='Users', y='Stage', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)# Decorations plt.xlabel("$Users$") plt.ylabel("Stage of Purchase") plt.yticks(fontsize=12) plt.title("Population Pyramid of the Marketing Funnel", fontsize=22) plt.legend() plt.show()

30. Categorical Plots（分類圖）

由Seaborn庫提供的分類圖可用于可視化彼此相關(guān)的2個(gè)或更多分類變量的計(jì)數(shù)分布。

# Load Dataset titanic = sns.load_dataset("titanic")# Plot g = sns.catplot("alive", col="deck", col_wrap=4,data=titanic[titanic.deck.notnull()],kind="count", height=3.5, aspect=.8, palette='tab20')fig.suptitle('sf') plt.show()

# Load Dataset titanic = sns.load_dataset("titanic")# Plot sns.catplot(x="age", y="embark_town",hue="sex", col="class",data=titanic[titanic.embark_town.notnull()],orient="h", height=5, aspect=1, palette="tab10",kind="violin", dodge=True, cut=0, bw=.2)

Composition

31. Waffle Chart（華夫餅表）

Waffle表可使用pywaffle包來創(chuàng)建。

#! pip install pywaffle # Reference: https://stackoverflow.com/questions/41400136/how-to-do-waffle-charts-in-python-square-piechart from pywaffle import Waffle# Import df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data df = df_raw.groupby('class').size().reset_index(name='counts') n_categories = df.shape[0] colors = [plt.cm.inferno_r(i/float(n_categories)) for i in range(n_categories)]# Draw Plot and Decorate fig = plt.figure(FigureClass=Waffle,plots={'111': {'values': df['counts'],'labels': ["{0} ({1})".format(n[0], n[1]) for n in df[['class', 'counts']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12},'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18}},},rows=7,colors=colors,figsize=(16, 9) )

#! pip install pywaffle from pywaffle import Waffle# Import # df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data # By Class Data df_class = df_raw.groupby('class').size().reset_index(name='counts_class') n_categories = df_class.shape[0] colors_class = [plt.cm.Set3(i/float(n_categories)) for i in range(n_categories)]# By Cylinders Data df_cyl = df_raw.groupby('cyl').size().reset_index(name='counts_cyl') n_categories = df_cyl.shape[0] colors_cyl = [plt.cm.Spectral(i/float(n_categories)) for i in range(n_categories)]# By Make Data df_make = df_raw.groupby('manufacturer').size().reset_index(name='counts_make') n_categories = df_make.shape[0] colors_make = [plt.cm.tab20b(i/float(n_categories)) for i in range(n_categories)]# Draw Plot and Decorate fig = plt.figure(FigureClass=Waffle,plots={'311': {'values': df_class['counts_class'],'labels': ["{1}".format(n[0], n[1]) for n in df_class[['class', 'counts_class']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Class'},'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18},'colors': colors_class},'312': {'values': df_cyl['counts_cyl'],'labels': ["{1}".format(n[0], n[1]) for n in df_cyl[['cyl', 'counts_cyl']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Cyl'},'title': {'label': '# Vehicles by Cyl', 'loc': 'center', 'fontsize':18},'colors': colors_cyl},'313': {'values': df_make['counts_make'],'labels': ["{1}".format(n[0], n[1]) for n in df_make[['manufacturer', 'counts_make']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Manufacturer'},'title': {'label': '# Vehicles by Make', 'loc': 'center', 'fontsize':18},'colors': colors_make}},rows=9,figsize=(16, 14) )

32. Pie Chart（餅狀圖）

餅狀圖大家應(yīng)該很熟悉，這里只有一個(gè)小建議：明確標(biāo)記餅狀圖每個(gè)部分的百分比或數(shù)字。

# Import df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data df = df_raw.groupby('class').size()# Make the plot with pandas df.plot(kind='pie', subplots=True, figsize=(8, 8), dpi= 80) plt.title("Pie Chart of Vehicle Class - Bad") plt.ylabel("") plt.show()

# Import df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data df = df_raw.groupby('class').size().reset_index(name='counts')# Draw Plot fig, ax = plt.subplots(figsize=(12, 7), subplot_kw=dict(aspect="equal"), dpi= 80)data = df['counts'] categories = df['class'] explode = [0,0,0,0,0,0.1,0]def func(pct, allvals):absolute = int(pct/100.*np.sum(allvals))return "{:.1f}% ({:d} )".format(pct, absolute)wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data),textprops=dict(color="w"), colors=plt.cm.Dark2.colors,startangle=140,explode=explode)# Decoration ax.legend(wedges, categories, title="Vehicle Class", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1)) plt.setp(autotexts, size=10, weight=700) ax.set_title("Class of Vehicles: Pie Chart") plt.show()

33. Treemap（樹狀圖）

# pip install squarify import squarify # Import Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data df = df_raw.groupby('class').size().reset_index(name='counts') labels = df.apply(lambda x: str(x[0]) + "\n (" + str(x[1]) + ")", axis=1) sizes = df['counts'].values.tolist() colors = [plt.cm.Spectral(i/float(len(labels))) for i in range(len(labels))]# Draw Plot plt.figure(figsize=(12,8), dpi= 80) squarify.plot(sizes=sizes, label=labels, color=colors, alpha=.8)# Decorate plt.title('Treemap of Vechile Class') plt.axis('off') plt.show()

34. Bar Chart（條形圖）

條形圖是根據(jù)計(jì)數(shù)或任何給定指標(biāo)而可視化條目。在下面的圖表中，為每個(gè)條目使用了不同的顏色，但通常可能希望為所有條目選擇一種顏色，除非按組對它們進(jìn)行著色。顏色名稱存儲(chǔ)在all_colors下面的代碼中。可以通過設(shè)置color參數(shù)來更改條形的顏色。

import random# Import Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data df = df_raw.groupby('manufacturer').size().reset_index(name='counts') n = df['manufacturer'].unique().__len__()+1 all_colors = list(plt.cm.colors.cnames.keys()) random.seed(100) c = random.choices(all_colors, k=n)# Plot Bars plt.figure(figsize=(16,10), dpi= 80) plt.bar(df['manufacturer'], df['counts'], color=c, width=.5) for i, val in enumerate(df['counts'].values):plt.text(i, val, float(val), horizontalalignment='center',verticalalignment='bottom', fontdict={'fontweight':500, 'size':12})# Decoration plt.gca().set_xticklabels(df['manufacturer'], rotation=60, horizontalalignment= 'right') plt.title("Number of Vehicles by Manaufacturers", fontsize=22) plt.ylabel('# Vehicles') plt.ylim(0, 45) plt.show()

Change

35. Time Series Plot（時(shí)間序列圖）

時(shí)間序列圖用于顯示給定度量隨時(shí)間變化的方式。在這里，可以看到1949年至1969年間航空客運(yùn)量的變化情況。

# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Draw Plot plt.figure(figsize=(16,10), dpi= 80) plt.plot('date', 'traffic', data=df, color='tab:red')# Decoration plt.ylim(50, 750) xtick_location = df.index.tolist()[::12] xtick_labels = [x[-4:] for x in df.date.tolist()[::12]] plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7) plt.yticks(fontsize=12, alpha=.7) plt.title("Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.grid(axis='both', alpha=.3)# Remove borders plt.gca().spines["top"].set_alpha(0.0) plt.gca().spines["bottom"].set_alpha(0.3) plt.gca().spines["right"].set_alpha(0.0) plt.gca().spines["left"].set_alpha(0.3) plt.show()

36. Time Series with Peaks and Troughs Annotated（帶波峰波谷標(biāo)記的時(shí)序圖）

下面的時(shí)間序列繪制了所有的波峰和波谷，并注釋了所選特殊事件的發(fā)生。

# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Get the Peaks and Troughs data = df['traffic'].values doublediff = np.diff(np.sign(np.diff(data))) peak_locations = np.where(doublediff == -2)[0] + 1doublediff2 = np.diff(np.sign(np.diff(-1*data))) trough_locations = np.where(doublediff2 == -2)[0] + 1# Draw Plot plt.figure(figsize=(16,10), dpi= 80) plt.plot('date', 'traffic', data=df, color='tab:blue', label='Air Traffic') plt.scatter(df.date[peak_locations], df.traffic[peak_locations], marker=mpl.markers.CARETUPBASE, color='tab:green', s=100, label='Peaks') plt.scatter(df.date[trough_locations], df.traffic[trough_locations], marker=mpl.markers.CARETDOWNBASE, color='tab:red', s=100, label='Troughs')# Annotate for t, p in zip(trough_locations[1::5], peak_locations[::3]):plt.text(df.date[p], df.traffic[p]+15, df.date[p], horizontalalignment='center', color='darkgreen')plt.text(df.date[t], df.traffic[t]-35, df.date[t], horizontalalignment='center', color='darkred')# Decoration plt.ylim(50,750) xtick_location = df.index.tolist()[::6] xtick_labels = df.date.tolist()[::6] plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.7) plt.title("Peak and Troughs of Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.yticks(fontsize=12, alpha=.7)# Lighten borders plt.gca().spines["top"].set_alpha(.0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.0) plt.gca().spines["left"].set_alpha(.3)plt.legend(loc='upper left') plt.grid(axis='y', alpha=.3) plt.show()

37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot（自相關(guān)和部分自相關(guān)圖）

ACF圖顯示時(shí)間序列與其自身滯后的相關(guān)性。

PACF在另一方面顯示了任何給定滯后（時(shí)間序列）與當(dāng)前序列的自相關(guān)，但是刪除了滯后的貢獻(xiàn)。

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Draw Plot fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(16,6), dpi= 80) plot_acf(df.traffic.tolist(), ax=ax1, lags=50) plot_pacf(df.traffic.tolist(), ax=ax2, lags=20)# Decorate # lighten the borders ax1.spines["top"].set_alpha(.3); ax2.spines["top"].set_alpha(.3) ax1.spines["bottom"].set_alpha(.3); ax2.spines["bottom"].set_alpha(.3) ax1.spines["right"].set_alpha(.3); ax2.spines["right"].set_alpha(.3) ax1.spines["left"].set_alpha(.3); ax2.spines["left"].set_alpha(.3)# font size of tick labels ax1.tick_params(axis='both', labelsize=12) ax2.tick_params(axis='both', labelsize=12) plt.show()

38. Cross Correlation plot（交叉相關(guān)圖）

互相關(guān)圖顯示了兩個(gè)時(shí)間序列相互之間的滯后。

import statsmodels.tsa.stattools as stattools# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mortality.csv') x = df['mdeaths'] y = df['fdeaths']# Compute Cross Correlations ccs = stattools.ccf(x, y)[:100] nlags = len(ccs)# Compute the Significance level # ref: https://stats.stackexchange.com/questions/3115/cross-correlation-significance-in-r/3128#3128 conf_level = 2 / np.sqrt(nlags)# Draw Plot plt.figure(figsize=(12,7), dpi= 80)plt.hlines(0, xmin=0, xmax=100, color='gray') # 0 axis plt.hlines(conf_level, xmin=0, xmax=100, color='gray') plt.hlines(-conf_level, xmin=0, xmax=100, color='gray')plt.bar(x=np.arange(len(ccs)), height=ccs, width=.3)# Decoration plt.title('$Cross\; Correlation\; Plot:\; mdeaths\; vs\; fdeaths$', fontsize=22) plt.xlim(0,len(ccs)) plt.show()

39. Time Series Decomposition Plot（時(shí)間序列分解圖）

from statsmodels.tsa.seasonal import seasonal_decompose from dateutil.parser import parse# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv') dates = pd.DatetimeIndex([parse(d).strftime('%Y-%m-01') for d in df['date']]) df.set_index(dates, inplace=True)# Decompose result = seasonal_decompose(df['traffic'], model='multiplicative')# Plot plt.rcParams.update({'figure.figsize': (10,10)}) result.plot().suptitle('Time Series Decomposition of Air Passengers') plt.show()

40. Multiple Time Series（多時(shí)間序列）

可以繪制多個(gè)時(shí)間序列，如下所示。

# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mortality.csv')# Define the upper limit, lower limit, interval of Y axis and colors y_LL = 100 y_UL = int(df.iloc[:, 1:].max().max()*1.1) y_interval = 400 mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange'] # Draw Plot and Annotate fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= 80) columns = df.columns[1:] for i, column in enumerate(columns): plt.plot(df.date.values, df[column].values, lw=1.5, color=mycolors[i]) plt.text(df.shape[0]+1, df[column].values[-1], column, fontsize=14, color=mycolors[i])# Draw Tick lines for y in range(y_LL, y_UL, y_interval): plt.hlines(y, xmin=0, xmax=71, colors='black', alpha=0.3, linestyles="--", lw=0.5)# Decorations plt.tick_params(axis="both", which="both", bottom=False, top=False, labelbottom=True, left=False, right=False, labelleft=True) # Lighten borders plt.gca().spines["top"].set_alpha(.3) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.3)plt.title('Number of Deaths from Lung Diseases in the UK (1974-1979)', fontsize=22) plt.yticks(range(y_LL, y_UL, y_interval), [str(y) for y in range(y_LL, y_UL, y_interval)], fontsize=12) plt.xticks(range(0, df.shape[0], 12), df.date.values[::12], horizontalalignment='left', fontsize=12) plt.ylim(y_LL, y_UL) plt.xlim(-2, 80) plt.show()

41. Plotting with different scales using secondary Y axis（使用輔助Y軸來繪制不同范圍的圖形）

如果要顯示在同一時(shí)間點(diǎn)測量兩個(gè)不同數(shù)量的兩個(gè)時(shí)間序列，則可以在右側(cè)的輔助Y軸上再繪制第二個(gè)系列。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")x = df['date'] y1 = df['psavert'] y2 = df['unemploy']# Plot Line1 (Left Y Axis) fig, ax1 = plt.subplots(1,1,figsize=(16,9), dpi= 80) ax1.plot(x, y1, color='tab:red')# Plot Line2 (Right Y Axis) ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis ax2.plot(x, y2, color='tab:blue')# Decorations # ax1 (left Y axis) ax1.set_xlabel('Year', fontsize=20) ax1.tick_params(axis='x', rotation=0, labelsize=12) ax1.set_ylabel('Personal Savings Rate', color='tab:red', fontsize=20) ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' ) ax1.grid(alpha=.4)# ax2 (right Y axis) ax2.set_ylabel("# Unemployed (1000's)", color='tab:blue', fontsize=20) ax2.tick_params(axis='y', labelcolor='tab:blue') ax2.set_xticks(np.arange(0, len(x), 60)) ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10}) ax2.set_title("Personal Savings Rate vs Unemployed: Plotting in Secondary Y Axis", fontsize=22) fig.tight_layout() plt.show()

42. Time Series with Error Bands（帶有誤差帶的時(shí)間序列）

from scipy.stats import sem# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/user_orders_hourofday.csv") df_mean = df.groupby('order_hour_of_day').quantity.mean() df_se = df.groupby('order_hour_of_day').quantity.apply(sem).mul(1.96)# Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Orders", fontsize=16) x = df_mean.index plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::2], [str(d) for d in x[::2]] , fontsize=12) plt.title("User Orders by Hour of Day (95% confidence)", fontsize=22) plt.xlabel("Hour of Day")s, e = plt.gca().get_xlim() plt.xlim(s, e)# Draw Horizontal Tick lines for y in range(8, 20, 2): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)plt.show()

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem# Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date'])# Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)# Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Daily Orders", fontsize=16) x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)# Axis limits s, e = plt.gca().get_xlim() plt.xlim(s, e-2) plt.ylim(4, 10)# Draw Horizontal Tick lines for y in range(5, 10, 1): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)plt.show()

43. Stacked Area Chart（堆積面積圖）

堆積面積圖可以直觀地顯示多個(gè)時(shí)間序列的貢獻(xiàn)程度，因此可以輕松地相互比較。

# Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/nightvisitors.csv')# Decide Colors mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive'] # Draw Plot and Annotate fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= 80) columns = df.columns[1:] labs = columns.values.tolist()# Prepare data x = df['yearmon'].values.tolist() y0 = df[columns[0]].values.tolist() y1 = df[columns[1]].values.tolist() y2 = df[columns[2]].values.tolist() y3 = df[columns[3]].values.tolist() y4 = df[columns[4]].values.tolist() y5 = df[columns[5]].values.tolist() y6 = df[columns[6]].values.tolist() y7 = df[columns[7]].values.tolist() y = np.vstack([y0, y2, y4, y6, y7, y5, y1, y3])# Plot for each column labs = columns.values.tolist() ax = plt.gca() ax.stackplot(x, y, labels=labs, colors=mycolors, alpha=0.8)# Decorations ax.set_title('Night Visitors in Australian Regions', fontsize=18) ax.set(ylim=[0, 100000]) ax.legend(fontsize=10, ncol=4) plt.xticks(x[::5], fontsize=10, horizontalalignment='center') plt.yticks(np.arange(10000, 100000, 20000), fontsize=10) plt.xlim(x[0], x[-1])# Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3)plt.show()

44. Area Chart UnStacked（未堆積的面積圖）

未堆積區(qū)域圖用于可視化兩個(gè)或更多個(gè)系列相對于彼此的進(jìn)度（起伏）。在下面的圖表中，您可以清楚地看到隨著失業(yè)中位數(shù)持續(xù)時(shí)間的增加，個(gè)人儲(chǔ)蓄率會(huì)下降。未堆積的區(qū)域圖表很好地揭示了這種現(xiàn)象。

# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")# Prepare Data x = df['date'].values.tolist() y1 = df['psavert'].values.tolist() y2 = df['uempmed'].values.tolist() mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive'] columns = ['psavert', 'uempmed']# Draw Plot fig, ax = plt.subplots(1, 1, figsize=(16,9), dpi= 80) ax.fill_between(x, y1=y1, y2=0, label=columns[1], alpha=0.5, color=mycolors[1], linewidth=2) ax.fill_between(x, y1=y2, y2=0, label=columns[0], alpha=0.5, color=mycolors[0], linewidth=2)# Decorations ax.set_title('Personal Savings Rate vs Median Duration of Unemployment', fontsize=18) ax.set(ylim=[0, 30]) ax.legend(loc='best', fontsize=12) plt.xticks(x[::50], fontsize=10, horizontalalignment='center') plt.yticks(np.arange(2.5, 30.0, 2.5), fontsize=10) plt.xlim(-10, x[-1])# Draw Tick lines for y in np.arange(2.5, 30.0, 2.5): plt.hlines(y, xmin=0, xmax=len(x), colors='black', alpha=0.3, linestyles="--", lw=0.5)# Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3) plt.show()

45. Calendar Heat Map（日歷熱力圖）

與時(shí)間序列圖相比，日歷熱力圖是基于時(shí)間的數(shù)據(jù)可視化的備選項(xiàng)。雖然可以在視覺上吸引人，但數(shù)值并不十分明顯。然而，它可以很好地描繪極端值和假日效果。

import matplotlib as mpl import calmap# Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date']) df.set_index('date', inplace=True)# Plot plt.figure(figsize=(16,10), dpi= 80) calmap.calendarplot(df['2014']['VIX.Close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'Yahoo Stock Prices'}) plt.show()

46. Seasonal Plot（季度圖）

from dateutil.parser import parse # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Prepare data df['year'] = [parse(d).year for d in df.date] df['month'] = [parse(d).strftime('%b') for d in df.date] years = df['year'].unique()# Draw Plot mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive', 'deeppink', 'steelblue', 'firebrick', 'mediumseagreen'] plt.figure(figsize=(16,10), dpi= 80)for i, y in enumerate(years):plt.plot('month', 'traffic', data=df.loc[df.year==y, :], color=mycolors[i], label=y)plt.text(df.loc[df.year==y, :].shape[0]-.9,df.loc[df.year==y, 'traffic'][-1:].values[0], y, fontsize=12, color=mycolors[i])# Decoration plt.ylim(50,750) plt.xlim(-0.3, 11) plt.ylabel('$Air Traffic$') plt.yticks(fontsize=12, alpha=.7) plt.title("Monthly Seasonal Plot: Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.grid(axis='y', alpha=.3)# Remove borders plt.gca().spines["top"].set_alpha(0.0) plt.gca().spines["bottom"].set_alpha(0.5) plt.gca().spines["right"].set_alpha(0.0) plt.gca().spines["left"].set_alpha(0.5) # plt.legend(loc='upper right', ncol=2, fontsize=12) plt.show()

Groups

47. Dendrogram（樹狀圖）

樹形圖基于給定的距離度量將相似的點(diǎn)組合在一起，并基于點(diǎn)的相似性將它們組織在樹狀鏈接中。

import scipy.cluster.hierarchy as shc# Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/USArrests.csv')# Plot plt.figure(figsize=(16, 10), dpi= 80) plt.title("USArrests Dendograms", fontsize=22) dend = shc.dendrogram(shc.linkage(df[['Murder', 'Assault', 'UrbanPop', 'Rape']], method='ward'), labels=df.State.values, color_threshold=100) plt.xticks(fontsize=12) plt.show()

48. Cluster Plot（簇狀圖）

Cluster Plot可用于劃分屬于同一群集的點(diǎn)。下面是根據(jù)USArrests數(shù)據(jù)集將美國各州分為5組的代表性示例。該集群圖使用“謀殺”和“攻擊”列作為X和Y軸。或者，您可以將第一個(gè)到主要組件用作X軸和Y軸。

from sklearn.cluster import AgglomerativeClustering from scipy.spatial import ConvexHull# Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/USArrests.csv')# Agglomerative Clustering cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward') cluster.fit_predict(df[['Murder', 'Assault', 'UrbanPop', 'Rape']]) # Plot plt.figure(figsize=(14, 10), dpi= 80) plt.scatter(df.iloc[:,0], df.iloc[:,1], c=cluster.labels_, cmap='tab10') # Encircle def encircle(x,y, ax=None, **kw):if not ax: ax=plt.gca()p = np.c_[x,y]hull = ConvexHull(p)poly = plt.Polygon(p[hull.vertices,:], **kw)ax.add_patch(poly)# Draw polygon surrounding vertices encircle(df.loc[cluster.labels_ == 0, 'Murder'], df.loc[cluster.labels_ == 0, 'Assault'], ec="k",fc="gold", alpha=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == 1, 'Murder'], df.loc[cluster.labels_ == 1, 'Assault'], ec="k", fc="tab:blue", alpha=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == 2, 'Murder'], df.loc[cluster.labels_ == 2, 'Assault'], ec="k", fc="tab:red", alpha=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == 3, 'Murder'], df.loc[cluster.labels_ == 3, 'Assault'], ec="k", fc="tab:green", alpha=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == 4, 'Murder'], df.loc[cluster.labels_ == 4, 'Assault'], ec="k", fc="tab:orange", alpha=0.2, linewidth=0)# Decorations plt.xlabel('Murder'); plt.xticks(fontsize=12) plt.ylabel('Assault'); plt.yticks(fontsize=12) plt.title('Agglomerative Clustering of USArrests (5 Groups)', fontsize=22) plt.show()

49. Andrews Curve（安德魯斯曲線）

安德魯斯曲線有助于可視化是否存在基于給定分組的數(shù)字特征的固有分組。如果功能（數(shù)據(jù)集中的列）無法區(qū)分組（cyl)那么線條將不會(huì)很好地隔離，如下所示。

from pandas.plotting import andrews_curves# Import df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") df.drop(['cars', 'carname'], axis=1, inplace=True)# Plot plt.figure(figsize=(12,9), dpi= 80) andrews_curves(df, 'cyl', colormap='Set1')# Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3)plt.title('Andrews Curves of mtcars', fontsize=22) plt.xlim(-3,3) plt.grid(alpha=0.3) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.show()

50. Parallel Coordinates（平行坐標(biāo)）

平行坐標(biāo)有助于可視化特征是否有助于有效地隔離組。如果實(shí)現(xiàn)隔離，則該特征可能在預(yù)測該組時(shí)非常有用。

from pandas.plotting import parallel_coordinates# Import Data df_final = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/diamonds_filter.csv")# Plot plt.figure(figsize=(12,9), dpi= 80) parallel_coordinates(df_final, 'cut', colormap='Dark2')# Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3)plt.title('Parallel Coordinated of Diamonds', fontsize=22) plt.grid(alpha=0.3) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.show()

總結(jié)

以上是生活随笔為你收集整理的50个最有用的Matplotlib数据分析与可视化图的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：吴恩达机器学习作业（1）：线性回归
下一篇： jetcar.exe - jetcar是

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

50个最有用的Matplotlib数据分析与可视化图

設(shè)置

Correlation

1.Scatter plot（散點(diǎn)圖）

2.Bubble plot with Encircling（包圍的氣泡圖）

3. Scatter plot with linear regression line of best fit（散點(diǎn)圖與最佳線性擬合回歸線）

4.Jittering with stripplot（帶條紋的抖動(dòng)）

5. Counts Plot（計(jì)數(shù)圖）

6. Marginal Histogram（邊緣直方圖）

7. Marginal Boxplot（邊緣箱線圖）

8. Correllogram（相關(guān)圖）

9. Pairwise Plot（成對圖）

Deviation

10. Diverging Bars（發(fā)散條形圖）

11. Diverging Texts（發(fā)散文本）

12. Diverging Dot Plot（散點(diǎn)圖）

13. Diverging Lollipop Chart with Markers（帶標(biāo)記的發(fā)散型棒棒糖圖）

14. Area Chart（面積圖）

Ranking

15. Ordered Bar Chart（有序條形圖）

16. Lollipop Chart（棒棒糖圖）

17. Dot Plot（點(diǎn)圖）

18. Slope Chart（坡度圖）

19. Dumbbell Plot（啞鈴圖）

Distribution

20. Histogram for Continuous Variable（連續(xù)變量的直方圖）

21. Histogram for Categorical Variable（類型變量的直方圖）

22. Density Plot（密度圖）

23. Density Curves with Histogram（直方密度圖）

24. Joy Plot

25. Distributed Dot Plot（分布式點(diǎn)圖）

26. Box Plot（箱形圖）

27. Dot + Box Plot（點(diǎn)+箱型圖）

28. Violin Plot（小提琴圖）

29. Population Pyramid（人口金字塔）

30. Categorical Plots（分類圖）

Composition

31. Waffle Chart（華夫餅表）

32. Pie Chart（餅狀圖）

33. Treemap（樹狀圖）

34. Bar Chart（條形圖）

Change

35. Time Series Plot（時(shí)間序列圖）

36. Time Series with Peaks and Troughs Annotated（帶波峰波谷標(biāo)記的時(shí)序圖）

37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot（自相關(guān)和部分自相關(guān)圖）

38. Cross Correlation plot（交叉相關(guān)圖）

39. Time Series Decomposition Plot（時(shí)間序列分解圖）

40. Multiple Time Series（多時(shí)間序列）

41. Plotting with different scales using secondary Y axis（使用輔助Y軸來繪制不同范圍的圖形）

42. Time Series with Error Bands（帶有誤差帶的時(shí)間序列）

43. Stacked Area Chart（堆積面積圖）

44. Area Chart UnStacked（未堆積的面積圖）

45. Calendar Heat Map（日歷熱力圖）

46. Seasonal Plot（季度圖）

Groups

47. Dendrogram（樹狀圖）

48. Cluster Plot（簇狀圖）

49. Andrews Curve（安德魯斯曲線）

50. Parallel Coordinates（平行坐標(biāo)）

總結(jié)