30.32.33.词云图、3D绘图、矩阵可视化、绘制混淆矩阵
30.詞云圖(Word cloud)
30.1.Example 1: Basic word cloud
31.3D繪圖
31.1.在3D圖上繪制2D數據
31.2.3D 散點圖 (scatterplot)
31.3.3D surface (color map)
32.矩陣可視化(Matshow)
33.繪制混淆矩陣(plot Confusion Matrix)
30.詞云圖(Word cloud)
詞云圖作用主要是為了文本數據的視覺表示,由詞匯組成類似云的彩色圖形。相對柱狀圖,折線圖,餅圖等等用來顯示數值數據的圖表,詞云圖的獨特之處在于,它可以展示大量文本數據。每個詞的重要性以字體大小,字體越大,越突出,也越重要。通過詞云圖,用戶可以快速感知最突出的文字,迅速抓住重點。
詞云圖是對文本中出現頻率較高的“關鍵詞”予以視覺化的展現,詞云圖過濾掉大量的低頻低質的文本信息,使得瀏覽者只要一眼掃過文本就可領略文本的主旨。
需要安裝wordcloud模塊:pip install wordcloud
30.1.Example 1: Basic word cloud
from wordcloud import WordCloud import matplotlib.pyplot as plttext = ('Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining. Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science. Turing award winner Jim Gray imagined data science as a fourth paradigm of science (empirical, theoretical, computational and now data-driven) and asserted that everything about science is changing because of the impact of information technology and the data deluge. In 2012, when Harvard Business Review called it The Sexiest Job of the 21st Century, the term data science became a buzzword. It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. Even the suggestion that data science is sexy was paraphrasing Hans Rosling, featured in a 2011 BBC documentary with the quote, Statistics is now the sexiest subject around. Nate Silver referred to data science as a sexed up term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as data science to be more attractive, which can cause the term to become dilute beyond usefulness. While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources')wordcloud = WordCloud(width=1280, height=853, margin=0, colormap='Blues').generate(text) plt.figure(figsize=(13, 8.6)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.margins(x=0, y=0) plt.show()修改尺寸大小等
from wordcloud import WordCloud import matplotlib.pyplot as plt# Create a list of word text = ("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")# Create the wordcloud object wordcloud = WordCloud(width=480, height=480, margin=0).generate(text)# Display the generated image: plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.margins(x=0, y=0) plt.show()定制詞云圖:
from wordcloud import WordCloud import matplotlib.pyplot as plt# Create a list of word text = ("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")# Create the wordcloud object wordcloud = WordCloud(width=480, height=480, max_font_size=20, min_font_size=10).generate(text) plt.figure() plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.margins(x=0, y=0) plt.show()
可以設置要在tagcloud上顯示的最大單詞數。 假設只想顯示3個最常用的單詞:
更改背景顏色
from wordcloud import WordCloud import matplotlib.pyplot as plt# Create a list of word text = ("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")# Create the wordcloud object wordcloud = WordCloud(width=480, height=480,background_color="skyblue").generate(text) plt.figure() plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.margins(x=0, y=0) plt.show()
最后使用調色板更改單詞的顏色
31.3D繪圖
31.1.在3D圖上繪制2D數據
演示使用ax.plot的zdir關鍵字在一個3D圖的選擇軸上繪制2D數據。
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3Dfig = plt.figure() ax = fig.gca(projection='3d')# Plot a sin curve using the x and y axes. x = np.linspace(0, 1, 100) y = np.sin(x * 2 * np.pi) / 2 + 0.5 # zdir='z'表示在x,y上進行繪制, ax.plot(x, y, zs=0, zdir='z', label='curve in (x, y)')# Plot scatterplot data (20 2D points per colour) on the x and z axes. colors = ('r', 'g', 'b', 'k')# Fixing random state for reproducibility np.random.seed(19680801)x = np.random.sample(20 * len(colors)) y = np.random.sample(20 * len(colors)) c_list = [] for c in colors:c_list.extend([c] * 20) # By using zdir='y', the y value of these points is fixed to the zs value 0 # and the (x, y) points are plotted on the x and z axes. # zdir='y' 之后,在x,z上繪圖。畫的是點 ax.scatter(x, y, zs=0, zdir='y', c=c_list, label='points in (x, z)')# Make legend, set axes limits and labels ax.legend() ax.set_xlim(0, 1) ax.set_ylim(0, 1) ax.set_zlim(0, 1) ax.set_xlabel('X') ax.set_ylabel('Y') ax.set_zlabel('Z')# Customize the view angle so it's easier to see that the scatter points lie # on the plane y=0 ax.view_init(elev=20., azim=-35)plt.show()31.2.3D 散點圖 (scatterplot)
Demonstration of a basic scatterplot in 3D.
import matplotlib.pyplot as plt import numpy as np# Fixing random state for reproducibility np.random.seed(19680801)def randrange(n, vmin, vmax):'''Helper function to make an array of random numbers having shape (n, )with each number distributed Uniform(vmin, vmax).'''return (vmax - vmin)*np.random.rand(n) + vminfig = plt.figure() ax = fig.add_subplot(111, projection='3d')n = 100# For each set of style and range settings, plot n random points in the box # defined by x in [23, 32], y in [0, 100], z in [zlow, zhigh]. for m, zlow, zhigh in [('o', -50, -25), ('^', -30, -5)]:xs = randrange(n, 23, 32)ys = randrange(n, 0, 100)zs = randrange(n, zlow, zhigh)ax.scatter(xs, ys, zs, marker=m)ax.set_xlabel('X Label') ax.set_ylabel('Y Label') ax.set_zlabel('Z Label')plt.show()31.3.3D surface (color map)
演示繪制用coolwarm顏色圖著色的3D表面。 通過使用antialiased = False使該表面不透明。
還演示了使用LinearLocator和自定義格式定義z軸刻度標簽。
32.矩陣可視化(Matshow)
Simple matshow example.
import matplotlib.pyplot as plt import numpy as npdef samplemat(dims):"""Make a matrix with all zeros and increasing elements on the diagonal"""aa = np.zeros(dims)for i in range(min(dims)):aa[i, i] = ireturn aa# Display matrix plt.matshow(samplemat((15, 15)))plt.show() import numpy as np import matplotlib.pyplot as pltalphabets = ['A', 'B', 'C', 'D', 'E']# randomly generated array random_array = np.random.random((5, 5))figure = plt.figure() axes = figure.add_subplot(111)# using the matshow() function caxes = axes.matshow(random_array, interpolation='nearest') figure.colorbar(caxes)axes.set_xticklabels([''] + alphabets) axes.set_yticklabels([''] + alphabets)plt.show()33.繪制混淆矩陣(plot Confusion Matrix)
import numpy as np import matplotlib.pyplot as plt import numpy as np import itertoolsdef plot_confusion_matrix(cm,target_names,title='Confusion matrix',cmap=None,normalize=True):"""given a sklearn confusion matrix (cm), make a nice plotArguments---------cm: confusion matrix from sklearn.metrics.confusion_matrixtarget_names: given classification classes such as [0, 1, 2]the class names, for example: ['high', 'medium', 'low']title: the text to display at the top of the matrixcmap: the gradient of the values displayed from matplotlib.pyplot.cmsee http://matplotlib.org/examples/color/colormaps_reference.htmlplt.get_cmap('jet') or plt.cm.Bluesnormalize: If False, plot the raw numbersIf True, plot the proportionsUsage-----plot_confusion_matrix(cm = cm, # confusion matrix created by# sklearn.metrics.confusion_matrixnormalize = True, # show proportionstarget_names = y_labels_vals, # list of names of the classestitle = best_estimator_name) # title of graphCitiation---------http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html"""accuracy = np.trace(cm) / float(np.sum(cm))misclass = 1 - accuracyaccuracy = np.trace(cm) / float(np.sum(cm))misclass = 1 - accuracyif cmap is None:cmap = plt.get_cmap('Blues')plt.figure(figsize=(8, 6))plt.imshow(cm, interpolation='nearest', cmap=cmap)plt.title(title)plt.colorbar()if target_names is not None:tick_marks = np.arange(len(target_names))plt.xticks(tick_marks, target_names, rotation=45)plt.yticks(tick_marks, target_names)if normalize:cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]thresh = cm.max() / 1.5 if normalize else cm.max() / 2for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):if normalize:plt.text(j, i, "{:0.4f}".format(cm[i, j]),horizontalalignment="center",color="white" if cm[i, j] > thresh else "black")else:plt.text(j, i, "{:,}".format(cm[i, j]),horizontalalignment="center",color="white" if cm[i, j] > thresh else "black")plt.tight_layout()plt.ylabel('True label')plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))plt.show()plot_confusion_matrix(cm=np.array([[1098, 1934, 807],[604, 4392, 6233],[162, 2362, 31760]]),normalize=False,target_names=['high', 'medium', 'low'],title="Confusion Matrix")總結
以上是生活随笔為你收集整理的30.32.33.词云图、3D绘图、矩阵可视化、绘制混淆矩阵的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 新能源汽车续航600公里,充电到了100
- 下一篇: 大众id保养后充电里程变低怎么回事?