Basic statistics

Mean and median

Given a data series (2, 5, 9), the mean (average) is (2 + 5 + 9)/3 = 5.3, and the median is the 5.


Correlation is a measure of the relation between two or more variables. The value ranges from -1 to +1. Negative values represent "negative correlation", positive values represent "positive correlation", and 0 represents a "lack of correlation".

The most commonly used correlation measure is the Pearson correlation:

r12 = [S(Yi1 - <Y>1)*(Yi2 - <Y>2)] / [S(Yi1 - <Y>1)2 * S(Yi2 - <Y>2)2]1/2

where <Y> = SYi/n and n is the sample size.

Correlation value is frequently given with a confidence interval.

Sample and population

Population is the collection of objects of a study. Sample is a subset of the population used to estimate the statistics of the population. For example, the population of an opinion poll could be the whole population on the earth. The poll can only "sample" a limited number of individuals.


Variance of a population:

s2 = S(xi-m)2/N

where m is the population mean and N is the population size.

Variance of a sample (for estimating the population variance):

s2 = S(xi-<x>)2/ (n-1)

where <x> is the sample mean and n is the sample size.

Standard deviation

Standard deviation (s or s) is the square root of the variance.

Confidence interval

The confidence intervals for specific statistics (e.g., means, or correlation) is a range of values around the statistic where the "true" (population) statistic can be expected to be located (with a given level of certainty).

The 95%-confidence level of the correlation is (0.78, 0.95) means that the true (population) correlation has a 95% probability to be within the interval (0.78, 0.95).


The p-value represents the probability of error that is involved in accepting the observed result as valid rather than a fluctuation in the population.

The p-value of .05 is usually treated as an acceptable error level. For more information, please visit the Electronic Statistics Textbook.