Matlab Tutorial 4: Data Analysis and Statistics with Matlab
Stem Plot
To make this a bit more complete we show some other plotting possibilities. In Matlab we can also illustrate a discrete sequence (stem plot). This is done as:
>> hist(x,6) |
>> y=0:10; |
See figure below. Notice that the values in vector x are plotted versus its index.
Assume you want to produce a curve from sampled data. If you take sampled data from a sine curve with an amplitude of 1. It becomes a discrete sequence of data. The sine curve looks like: y=sin(x);
>> bar(x),grid >> title('bar for vector x') |
>> yy=sin(z) |
>> stem(yy),grid,title('Stem plot for a sine curve') |
We can consider this to be a continuous sine wave sampled. This is how a computer system would look upon the signal.
Staircase Plot and Pia Charts
What other presentations can we find? We will also take a look on pie- and staircase diagrams. Below we have an m-file.
1 2 3 4 5 6 7 | % M-file created by MatlabCorner.com % M-file makes two presentations of vector x. % File created by MatlabCorner.com subplot(1,2,1) pie(x),grid, title('Pie diagram for vector x') subplot(1,2,2) stairs(x), grid,title('Staircase plot for vector x') |
See the result in the figures below.
Each element in the pie diagram is given in percent of the whole vector sum x. For a staircase plot the elements in vector x is plotted versus the index.
- pie(x,explode) is used to specify slices that should be pulled out from the pie. The vector EXPLODE must be the same size as X. The slices where EXPLODE is non-zero will be pulled out.
- pie(x,labels) is used to label each pie slice with cell array LABELS. LABELS must be the same size as X and can only contain strings.
>> pie([ 2 4 3 5],{'North','South','East','West'}) |
Several of the commands that we have presented here can also be used in three dimensions. The commands are changed to: bar3(x), stem3(x) and pie3(x). Try them.
Statistics Commands in Matlab
We will now focus on some commands for statistics. These are needed to evaluate measured data. There are also some functions that can be added in your figure window as well. Let’s start with some simple commands, but to use them we need to have some repetition of different concepts.
-
Mean value:
Assume a vector x=[ 1 2 3 4 5]. Sum the elements and divide with the number of elements. In Matlab: mean(x)
-
Median value:
Gives the element in the vector that is in the middle. For the vector x it should be 3. There are equally many elements that are higher as well as lower than 3. If we have y=[ 1 3 4 6], then median(y) produces the answer 3.5 . Because if we have an even number of elements in vector y. It calculates the mean value of 3 and 4. In Matlab: median(x)
-
Variance:
If we have a data vector x=[ 1 3 4 6]. The variance is defined simply as: Var(x)= ( (1-3.5)²+ (3-3.5)² + (4-3.5)² + (6-3.5)² )/3 Notice that 3.5 is also mean value of the vector x and divide with the number of elements-1. Here this is 3. If we didn’t square the vector the variance would become zero and nothing could be stated. A large variance means that we have a large spread or deviation in the data compared to the mean value. If x has the unit [Volt], then var(x) has [Volt]² as unit. In Matlab: var(x)
-
Standard deviation:
Is simply the square root of the variance. This means that if we have a vector with voltage measurements the standard deviation would also have the same unit. This is related to some average deviation in vector x. In Matlab: std(x) or sqrt(var(x))
-
Correlation:
Suppose you have done several measurements and you are interested in finding out if there are any dependence in the variation between two or more variables. This is called correlation and is a number between -1 and +1. If the number is zero there exist no correlation what so ever. For instance a positive correlation: more rain => more umbrellas are sold. If we would have measured the rain in [mm] a certain region and the number of sold umbrellas in the same region. A negative correlation could be between number of sold umbrellas and sold sun protection. If we could measure both these variables in a certain region during the summer we would probably discover a covariation.
Assume an increase in the number of sold umbrellas and most likely we would find a decrease in the number of sold sun protection. Even if we have two variables that produce a non-zero correlation coefficient It doesn’t necessarily mean there is a true dependence between the variables. We can always find some very bizarre correlations among variables, but it doesn’t mean they can be described by a function. Probably it is possible to that there is a co-variation between number of goats in Spain and the number of sold umbrellas in England. But very few of us would then state there is a function that could relate these two variables to each other.
If we have correlation and plot one variable versus the other this can be seen as proportionality in the graph. In Matlab: corrcoef(x,y)
-
Normal distribution:
Many tests based on samples have this kind of distribution. They describe a well known curve shape called normal distribution. It has two very important properties: the center value (mean value) and the width of the curve (standard deviation). In the figure below we illustrate an example with mean value 0 and standard deviation 1. The curve was made by the following commands and created by 10 000 000 random numbers.
>> u=randn(1,10000000); % don’t forget the semi-colon. >> hist(u,100) % 100 intervals.
Of course the mean value could be any value and width doesn’t need 1. Suppose that we put two vertical lines at ±1 (standard deviation). This makes 68.3 % of the area. If we put the vertical lines at ±1.96 the area becomes 95 % of the area.
-
Confidence interval:
In the above figure it means that the true mean value is within the confidence interval with 95% probability. In order to have a 95% probability we need to expand the interval to + 2.58. We could say that the confidence interval is a measure of the uncertainty coincidence contributes with in finding the true mean value.
Exercise 1: Read two vectors and try some of the statistics commands
>> X=1:5; Z=[ 01 4 7 12]; % Calculate the mean and median value of X and Z. >> mean(X), mean(Z) % The result should become 3 and 4.8. >> median(X), median(Z) % The result should be: 3 and 4. % Standard deviation for the vectors. >> std(X), std(Z) % the answer is 1.5811 and 4.8683 respectively. % This seems very logical, due to the large spread % in the elements of vector Z. % Now let’s plot Z versus X. See figure below >> plot(X,Z), grid |
Use the menu of the figure window and choose Tools-> Data Statistics. Now there will appear a small box where you can choose: min, max, median, std and range both for X and Z values. Mark mean and std for the Z vector. This will give three dotted horizontal lines on the plot. The upper is mean value + standard deviation, in the middle we have the mean value and finally the lower one corresponds to mean value – standard deviation. See the figure below.
In the figure above it seems plausible enough to believe there is a positive correlation between X and Z. Id est when Z increases so does X. Let’s use Matlab to calculate the correlation.
>> corrcoef(X,Z) ans = 1 0.97435 0.97435 1 |
It becomes a 2×2 matrix. The element (1, 1) indicates that there is a correlation 1 between X and X. Element (2,2) gives the same information for Y and Y. Element (2,) and (1,2) gives the correlation between X and Y. This means we have a strong positive correlation (0.97435) between X and Y. The commands we used for statistics can have a matrix as an argument.
Create the matrix A=[X;Z].
- mean(A) Gives a vector, where each element is the mean value of a specific column.
- median(A) Gives a vector, where each element is the median value of the column.
- std(A) Gives a vector, where each element is the standard deviation of a specific column.
Try the commands. Are there any surprises in the table?
We will also repeat some previous matrix manipulating commands that can be used to calculate sums, differences and products.
- prod(A) Gives a vector with elements, that are the product of the column elements.
- sum(A) Gives a vector with elements containing the sum of each column.
- diff(A) Gives a matrix with elements that are the difference between two neighbouring elements.
- sort(A) Gives a matrix with elements in each column sorted in order.
Use the matrix A below and the vector X that was stated earlier.
>> A= 1 2 3 4 5 6 7 8 9 |
the commands in the above list can also be used on a vector. Try them on X.
>> prod(X), sum(X),diff(X) % or >> diff(X,2) % the same as diff(diff(X)) % Now try with the matrix A instead. >> diff(A), prod(A),sum(A) ans= ans= ans= 3 3 3 28 80 162 12 15 18 3 3 3 |
Finally create a new matrix F in order to find out how we can put a matrix together.
>> F=[ans; A] F= 12 15 18 1 2 3 4 5 6 7 8 9 |
Use the command sort on the resulting matrix F.
>> [A,index]=sort(F) % results in two matrices: one sorted matrix A % and one matrix containing the original position in the matrix A. |
Notice that the sort command only operates within the column so therefore we only need one index to keep track of the element position. We have so far in the course achieved simple text display or very rudimentary tables. I will try to show some useful commands in order to create better display of the output. The command fprintf can together with the use of flags specify the output. How many positions that should be used and how many decimals and so on? We can use it to write to the command window as well a text file.
Let’s start by writing to the command window. Example: we would like to create a table containing three columns. The first one has the numbers from 1 to 5, the second contains the square root of the numbers and the third calculates the cube of the numbers. See below for a suggestion of an m-file.
1 2 3 4 5 6 7 8 9 | % Alt_1.m file created by MatlabCorner.com % The m-file makes a table consisting of 3 columns. % We also use format codes in order to control the output. % \t=horizontal tab, \n=new line, %6.3f=6 positions and 3 decimals x=1:3; y1=sqrt(x); y2=x.^3; Y=[x' y1' y2']; disp(' x sqrt(x) x^3') fprintf('%4.0f \t %6.3f \t %6.3f \n', Y') |
The output in the command window will be:
x sqrt(x) x^3 1 1.000 1.000 2 1.414 8.000 3 1.732 27.000 |
A slightly changed m-file will more or less accomplish the same thing.
1 2 3 4 5 6 7 8 9 10 11 12 | % Alt_2.m file created by MatlabCorner.com % The m-file makes a table consisting of 3 columns. We also use format % codes in order to control the output. % \t=horizontal tab, \n=new line, %6.3f=6 positions and 3 decimals disp(' x sqrt(x) x^3') disp('---------------------------------') for x=1:3; y1=sqrt(x); y2=x.^3; Y=[x' y1' y2']; fprintf('%1.0f \t %6.3f \t %6.3f \n', Y') end |
Finally we will use fprintf to write to a text file that we creates. We modify the m-file Alt_2.m.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | % Alt_2.m file created by MatlabCorner.com % The m-file makes a table consisting of 3 columns. We also use % format codes in order to control the output and write to text % file:Alt_2.txt \t=horizontal tab, \n=New line, % %6.3f=6 positions and 3 decimals disp(' x sqrt(x) x^3') disp('---------------------------------') fid=fopen('Alt_2.txt','w') % creates a txt- file and writes to it. for x=1:3; y1=sqrt(x); y2=x.^3; Y=[x' y1' y2']; fprintf(fid, '%1.0f \t %6.3f \t %6.3f \n', Y') % fid=file identifier end fclose(fid) % closes the txt-file. |
Run the m-file Alt_2. Then please check the file: Alt_2.txt
cabo San lucas rentals by owner — March 3, 2015 @ 12:27 am
Thanks for sharing your thoughts about covariance.
Regards