makemid+matlab,《MATLAB基础》双语课
MATLAB雙語教學視頻第17講
MATLAB雙語教學視頻第18講
Summarizing Data
In this section...
“Overview” on page 5-10
“Measures of Location” on page 5-10
“Measures of Scale” on page 5-11
“Shape of a Distribution” on page 5-11Overview
Many MATLAB functions enable you to summarize the overall location, scale,
and shape of a data sample.
One of the advantages of working in MATLAB is that functions operate on
entire arrays of data, not just on single scalar values. The functions are said
to be vectorized. Vectorization allows for both efficient problem formulation,
using array-based data, and efficient computation, using vectorized statistical
functions.
Note This section continues the data analysis from “Preprocessing Data”
on page 5-3.Measures of Location
Summarize the location of a data sample by finding a “typical” value.
Common measures of location or “central tendency” are computed by the
functions mean, median, and mode:
load count.dat
x1 = mean(count)
x1 =
32.0000 46.5417 65.5833
x2 = median(count)
x2 =
23.5000 36.0000 39.0000
x3 = mode(count)
x3 =
11 9 9
Like all of its statistical functions, the MATLAB functions above summarize
data across observations (rows) while preserving variables (columns). The
functions compute the location of the data at each of the three intersections
in a single call.Measures of Scale
There are many ways to measure the scale or “dispersion” of a data sample.
The MATLAB functions max, min, std, and var compute some common
measures:
dx1 = max(count)-min(count)
dx1 =
107 136 250
dx2 = std(count)
dx2 =
25.3703 41.4057 68.0281
dx3 = var(count)
dx3 =
1.0e+003 *
0.6437 1.7144 4.6278
Like all of its statistical functions, the MATLAB functions above summarize
data across observations (rows) while preserving variables (columns). The
functions compute the scale of the data at each of the three intersections
in a single call.Shape of a Distribution
The shape of a distribution is harder to summarize than its location or
scale. The MATLAB hist function plots a histogram that provides a visual
summary:
figure
hist(count)
legend('Intersection 1',...
'Intersection 2',...
'Intersection 3')
Parametric models give analytic summaries of distribution shapes.
Exponential distributions, with parameter mu given by the data mean, are a
good choice for the traffic data:
c1 = count(:,1); % Data at intersection 1
[bin_counts,bin_locations] = hist(c1);
bin_width = bin_locations(2) - bin_locations(1);
hist_area = (bin_width)*(sum(bin_counts));
figure
hist(c1)
hold on
mu1 = mean(c1);
exp_pdf = @(t)(1/mu1)*exp(-t/mu1); % Integrates
% to 1
t = 0:150;
y = exp_pdf(t);
plot(t,(hist_area)*y,'r','LineWidth',2)
legend('Distribution','Exponential Fit')
are beyond the scope of this Getting Started guide. Statistics Toolbox
software provides functions for computing maximum likelihood estimates
of distribution parameters.
See “Descriptive Statistics” in the MATLAB Data Analysis documentation for
more information on summarizing data samples.
Visualizing Data
In this section...
“Overview” on page 5-14
“2-D Scatter Plots” on page 5-14
“3-D Scatter Plots” on page 5-16
“Scatter Plot Arrays” on page 5-18
“Exploring Data in Graphs” on page 5-19Overview
You can use many MATLAB graph types for visualizing data patterns and
trends. Scatter plots, described in this section, help to visualize relationships
among the traffic data at different intersections. Data exploration tools let
you query and interact with individual data points on graphs.
Note This section continues the data analysis from “Summarizing Data”
on page 5-10.2-D Scatter Plots
A 2-D scatter plot, created with the scatter function, shows the relationship
between the traffic volume at the first two intersections:
load count.dat
c1 = count(:,1); % Data at intersection 1
c2 = count(:,2); % Data at intersection 2
figure
scatter(c1,c2,'filled')
xlabel('Intersection 1')
ylabel('Intersection 2')
The covariance, computed by the cov function measures the strength of the
linear relationship between the two variables (how tightly the data lies along
a least-squares line through the scatter):
C12 = cov([c1 c2])
C12 =
1.0e+003 *
0.6437 0.9802
0.9802 1.7144
The results are displayed in a symmetric square matrix, with the covariance
of the ith and jth variables in the (i, j)th position. The ith diagonal element
is the variance of the ith variable.
Covariances have the disadvantage of depending on the units used to measure
the individual variables. You can divide a covariance by the standard
deviations of the variables to normalize values between +1 and –1. The
corrcoef function computes correlation coefficients:
R12 = corrcoef([c1 c2])
R12 =
1.0000 0.9331
0.9331 1.0000
r12 = R12(1,2) % Correlation coefficient
r12 =
0.9331
r12sq = r12^2 % Coefficient of determination
r12sq =
0.8707
Because it is normalized, the value of the correlation coefficient is readily
comparable to values for other pairs of intersections. Its square, the coefficient
of determination, is the variance about the least-squares line divided by
the variance about the mean. Thus, it is the proportion of variation in the
response (in this case, the traffic volume at intersection 2) that is eliminated
or statistically explained by a least-squares line through the scatter.
3-D Scatter Plots
A 3-D scatter plot, created with the scatter3 function, shows the relationship
between the traffic volume at all three intersections. Use the variables c1,
c2, and c3 that you created in the previous step:
figure
scatter3(c1,c2,c3,'filled')
xlabel('Intersection 1')
ylabel('Intersection 2')
zlabel('Intersection 3')
Measure the strength of the linear relationship among the variables in the
3-D scatter by computing eigenvalues of the covariance matrix with the eig
function:
vars = eig(cov([c1 c2 c3]))
vars =
1.0e+003 *
0.0442
0.1118
6.8300
explained = max(vars)/sum(vars)
explained =
0.9777
The eigenvalues are the variances along the principal components of the data.
The variable explained measures the proportion of variation explained by the
first principal component, along the axis of the data. Unlike the coefficient
of determination for 2-D scatters, this measure distinguishes predictor and
response variables.Scatter Plot Arrays
Use the plotmatrix function to make comparisons of the relationships
between multiple pairs of intersections:
figure
plotmatrix(count)
The plot in the (i, j)th position of the array is a scatter with the i th variable
on the vertical axis and the jth variable on the horizontal axis. The plot in the
ith diagonal position is a histogram of the ith variable.
For more information on statistical visualization, see “Plotting Data” and
“Interactive Data Exploration” in the MATLAB Data Analysis documentation.Exploring Data? in Graphs
Using your mouse, you can pick observations on almost any MATLAB graph
with two tools from the figure toolbar:
? Data Cursor
? Data Brushing
These tools each place you in exploratory modes in which you can select data
points on graphs to identify their values and create workspace variables to
contain specific observations. When you use data brushing, you can also copy,
remove or replace the selected observations.
For example, make a scatter plot of the first and third columns of count:
load count.dat
scatter(count(:,1),count(:,3))
Select the Data Cursor Tool and click the right-most data point. A datatip
displaying the point’s x and y value is placed there.
Datatips display x-, y-, and z- (for 3-D plots) coordinates by default. You
can drag a datatip from one data point to another to see new values or add
additional datatips by right-clicking a datatip and using the context menu.
You can also customize the text that datatips display using MATLAB code.
For more information, see the datacursormode function and “Interacting with
Graphed Data” in the MATLAB Data Analysis documentation.
Data brushing is a related feature that lets you highlight one or more
observations on a graph by clicking or dragging. To enter data brushing
mode, click the left side of the Data Brushing tool on the figure toolbar.
Clicking the arrow on the right side of the tool icon drops down a color palette
for selecting the color with which to brush observations. This figure shows
the same scatter plot as the previous figure, but with all observations beyond
one standard deviation of the mean (as identified using the Tools > Data
Statistics GUI) brushed in red.
After you brush data observations, you can perform the following operations
on them:
? Delete them.
? Replace them with constant values.
? Replace them with NaN values.
? Drag or copy, and paste them to the Command Window.
? Save them as workspace variables.
For example, use the Data Brush context menu or the
Tools > Brushing > Create new variable option to create a new
variable called count13high.
A new variable in the workspace results:
count13high
count13high =
61 186
75 180
114 257
For more information, see the MATLAB brush function and “Marking Up
Graphs with Data Brushing” in the MATLAB Data Analysis documentation.
Linked plots, or data linking, is a feature closely related to data brushing. A
plot is said to be linked when it has a live connection to the workspace data it
depicts. The copies of variables stored in a plot object’s XData, YData, (and,
where appropriate, ZData), automatically updated whenever the workspace
variables to which they are linked change or are deleted. This causes the
graphs on which they appear to update automatically.
Linking plots to variables lets you track specific observations through
different presentations of them. When you brush data points in linked plots,
brushing one graph highlights the same observations in every graph that is
linked to the same variables.
Data linking establishes immediate, two-way communication between
figures and workspace variables, in the same way that the Variable Editor
communicates with workspace variables. You create links by activating the
Data Linking tool on a figure’s toolbar. Activating this tool causes the
Linked Plot information bar, displayed in the next figure, to appear at the top
of the plot (possibly obscuring its title). You can dismiss the bar (shown in
the following figure) without unlinking the plot; it does not print and is not
saved with the figure.
The following two graphs depict scatter plot displays of linked data after
brushing some observations on the left graph. The common variable, count
carries the brush marks to the right figure. Even though the right graph
is not in data brushing mode, it displays brush marks because it is linked
to its variables.
figure
scatter(count(:,1),count(:,2))
xlabel ('count(:,1)')
ylabel ('count(:,2)')
figure
scatter(count(:,3),count(:,2))
xlabel ('count(:,3)')
ylabel ('count(:,2)')
總結
以上是生活随笔為你收集整理的makemid+matlab,《MATLAB基础》双语课的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: iconfont阿里巴巴矢量图标库使用步
- 下一篇: java 获取六个月账期,应收帐龄分析里