top of page
Statistics Logo.png

Data

Heights (Data).png

Data: Collections of observations.

Population and Sample.png

Population: All of the data being considered.

Parameter: A numerical measurement that describes a characteristic of a population.

Sample: Data from a portion of the population.

Statistic: A numerical measurement that describes a characteristic of a sample.

Types of Data.png

Qualitative Data: Data that consists of labels or names.

Quantitative Data: Data that consists of numbers that represent measurements or counts.

Discrete Data: Quantitative data in which the number of values is finite—for example, the number of dice rolls before rolling a six.

Continuous Data: Quantitative data in which the number of values is infinite—for example, the lengths of distances from 0mm to 10mm.

Levels of Measurement.png

Nominal Level of Measurement: Data that cannot be arranged in order and consists of labels, names, or categories.

Ordinal Level of Measurement: Data that can be arranged in order, but the differences (subtraction) between data values are meaningless.

Interval Level of Measurement: Data that can be arranged in order, and the differences (subtraction) between data values are meaningful. At the interval level of measurement, there is no natural zero starting point, and ratios are meaningless.

Ratio Level of Measurement: Data that can be arranged in order, and the differences (subtraction) between data values are meaningful. At the ratio level of measurement, there is a natural zero starting point.

Sampling Methods

Random Sample: All members of a population have the same chance of being selected.

Simple Random Sample: A sample of n subjects is selected in a way that every sample of the same size has the same likelihood of being selected.

Systematic Sample: Every nth subject is selected.

Convenience Sample: Easily obtained data.

Stratified Sample: A population is divided into groups that share similar characteristics, and then members of each group are randomly selected.

Cluster Sample: A population is partitioned into groups, and then groups are randomly selected. If a group is chosen, then all members within that group are selected.

Voluntary Response Sample: Respondents decided whether they participate or not.

Descriptive Statistics

Descriptive Statistics: Methods that describe characteristics of data.

Mean.png

Mean:  The value at the center of a set of data. The mean carries one more decimal place than the values in the original data. 

Median.png

Median: The middle value of a data set, where the values are arranged in order of increasing magnitude. The median carries one more decimal place than the values in the original data. 

Mode.png

Mode: The value(s) that appear(s) most in a data set. Mode is a measure of center.

Midrange.png

Midrange: The value that is midway between the maximum value and the minimum value in a data set. The midrange carries one more decimal place than the values in the original data. 

Weighted Mean.png

Weighted Mean: The mean of data values that are assigned different weights. When calculating a GPA, the final answer should be rounded to two decimal places.

Additional Means.png

Harmonic Mean: A measure of center for data sets consisting of rates of change.

Root Mean Square: A measure of center commonly used in physics when dealing with data related to electricity.

Range.png

Range: A measure of variation, the amount that values vary among themselves. The range carries one more decimal place than the values in the original data.

Sample Standard Deviation.png
Population Standard Deviation.png

Standard Deviation: A measure of deviation from the mean.  A larger standard deviation value indicates a high amount of variation in the data. Outliers greatly affect the value of the standard deviation.

Range Rule.png

Range Rule: A tool used to classify data values as either significantly low, significantly high, or not significant. The range rule is based on evidence that for many data sets, a majority of their data values lie within two standard deviations of their mean.

Empirical Rule.png

Empirical Rule: Standard deviation properties that apply to data sets that have a bell-shaped distribution.

mean absolute deviation.png

Mean Absolute Deviation: The mean distance of the data from the mean.

deviation and variance.png

Variance: The square of a standard deviation.

Sample Coefficient of Variation.png
Population Coefficient of Variation.png

Coefficient of Variation (CV): A percentage that describes the standard deviation relative to the mean. The coefficient of variation is used to compare variation between two or more data sets that don't necessarily share the same scale or units. A high percentage indicates a high amount of variation in the data. The coefficient of variation is rounded to one decimal place.

Sample Z Score.png
Population Z Score.png

Z Score: The number of standard deviations that a value is below or above the mean. A z score that is less than or equal to -2 is significantly low, and a z score greater than or equal to 2 is significantly high. Z scores are rounded to two decimal places.

Percentile Determination Formula.png
Percentile to Data Value Formula.png
Percentile Example.png

Percentiles ( P ): A measure of location that divides data into 100 groups with about 1% of the values in each group. 

Quartile Example.png
Quartile Example 2.png

Quartile ( Q ): A Measure of location that divides data into 4 groups with about 25% of the values in each group. When the number of values less than (x) contains a decimal, it is rounded down. When the number of values less than (x) is a whole number, x is added to the number directly behind it, then divided by 2.

Boxplot.png

Boxplot: A graph of a data set made up of a line that extends from the maximum value to the minimum value and a box with lines drawn at Q1, Q2, and Q3.

Additional Quartile Stats.png

​Additional statistics are defined using percentiles and quartiles. 

Frequency Distributions

Frequency Distribution.png
Frequency Distribution 2.png

Frequency Distribution: A table that partitions data into classes. Each class is displayed along with the number of data values within it.

Lower Class Limits: The smallest numbers that belong to each different class.

Upper Class Limits: The largest numbers that belong to each different class.

Class Boundaries: The numbers that are at the center of the gaps between each class. Class boundaries also exist before the smallest class and after the largest class.

Class Midpoints: The values in the middle of each class. Class midpoints are calculated by adding the lower class limit to the upper class limit and dividing by two.

Class Widths: The difference between two consecutive lower class limits. Also, the difference between two consecutive class boundaries.

Relative and Percentage Frequency Distributions.png
Relative and Percentage Frequency Distributions 2.png

Relative Frequency Distribution: Relative frequencies replace the frequencies of a frequency distribution.

Percentage Frequency Distribution: Percentages replace the frequencies of a frequency distribution.

Cumulative Frequency.png

Cumulative Frequency Distribution: The frequency for each class is the sum of the frequencies for that class and all previous classes before it.

Qualitative Frequency Distribution.png

Frequency distributions also summarize qualitative data sets.

Mean from a Frequency Distribution.png

The mean of a frequency distribution can be approximated by pretending that all sample values in each class are equal to the class midpoint.

Histograms

Normal Distribution.png
Uniform Distribution.png
Distribution Skewed to the Right.png
Distribution Skewed to the Left.png

Histogram: A graph that consists of bars of equal width. The horizontal axis represents the classes of quantitative data, and the vertical axis represents frequencies.

Normal Distribution: A set of data that, when graphed as a histogram, has a bell shape.

Uniform Distribution: A histogram in which all of the bars are approximately the same height.

Skewed to the Right Distribution: A set of data that, when graphed as a histogram, has a longer right tail.

Skewed to the Left Distribution: A set of data that, when graphed as a histogram, has a longer left tail.

Probability

Probability ( P ): A measure of the likelihood of an event occurring. Either the exact decimal can be given or a probability can be rounded to three significant digits.

Dice Roll Simple Event.png

Event: An outcome.

Simple Event: An outcome that cannot be broken down any further.

Probability Scale.png

The possible values of a probability range from zero to one.  A probability of zero indicates that an event is impossible, and a probability of one indicates that an event is absolutely certain.

Relative Frequency Approximation of Probability.png
Classical Approach to Probability.png

Relative Frequency Approximation of Probability: The probability of an event is approximated by repeating a procedure and recording the number of times the event occurs. As the procedure is repeated, the relative frequency probability of an event tends to approach the true probability.

Classical Approach to Probability: The probability of a specific event is determined by finding the number of ways the specific event occurs and determining all of the other possible simple events that may also occur. The classical approach is only applicable if each of the different simple events is equally likely to occur.

Subjective Probability: The probability of an event is estimated using personal knowledge about the topic.

The Complement.png

The Complement of an Event: All outcomes in which a specific event does not occur.

Odds.png

Odds: An expression of likelihood.

Probability Addition Rule.png

Compound Event: An event that combines two or more simple events.

Probability Addition Rule: Used to find the probability of two events occurring separately or together during a procedure.

Complementary Events Addition Rule.png

An event added to its complement always equals one.

Probability Multiplication Rule.png

Probability Multiplication Rule: Used to find the probability of events that occur in different trials.

Independent Events: The occurrence of one event does not affect the probability of the other event.

Dependent Events: The occurrence of one event affects the probability of the other event. For large calculations, dependent events can be treated as independent events to help reduce the difficulty of the calculations.

Complements of At Least One.png

The complement of an event that uses the term "at least one" will have none of that event occurring.

Probability of At Least One.png

The probability of an event that uses the term "at least one" can be found by determining the probability of the event's complement.

Conditional Probability.png

Conditional Probability: The probability of an event is calculated based on information that a different event has already occurred.

Bayes’ Theorem.png

Bayes' Theorem: A theorem that can be used to calculate the conditional probability of an event.

Counting

Counting: Finding the total number of simple events for a given situation.

Multiplication Counting Rule.png

Multiplication Counting Rule: Used to calculate the total number of simple events for a sequence of events.

Factorial Rule.png

Factorial Rule: Used to find the number of ways that items in a set can be rearranged. Order matters when using the factorial rule.

Permutations Rule (Different Items).png
Permutations Rule (Identical Items).png

Permutations: Arrangements in which different sequences of the same items are counted separately.

Combinations Rule.png

Combinations: Arrangements in which different sequences of the same items are counted as being the same.

Discrete Probability Distributions

Probability Distribution.png

Random Variable: A variable that has a single value, determined by chance, for each outcome of a given procedure. Random variables can be discrete, with a countable number of values, or continuous, with an infinite number of values.

Probability Distribution: A graph, formula, description, or table that gives the probability for each value of the random variable.

Equations for Probability Distributions.png

Probability distributions have special formulas used to calculate a distribution's mean, standard deviation, and variance. Results should carry one more decimal place than the number of decimal places used for the random variable.

Range Rule (Prob).png

The range rule can also be used to determine if the value of a random variable is significantly high or low.

Expected Value.png

Expected Value (E): E = μ. 

Binomial Probability Distributions.png

Binomial Probability Distributions: The requirements for a binomial probability distribution are that there are only two outcomes, there is a fixed number of independent trials, and the probability of "success" remains constant throughout all trials. The two outcomes of a binomial probability distribution are typically classified as success and failure.

Binomial Probability Formula.png

The probability of "x" successes among "n" trials can be calculated using the binomial probability formula.

Equations for Binomial Distributions.png

Binomial distributions have special formulas used to calculate a distribution's mean, standard deviation, and variance.

Poisson Probability Formula.png
Poisson Distribution Standard Deviation.png

Poisson Probability Distribution: A probability distribution that applies to occurrences of an event over a specific interval, such as time or distance. The requirements for a Poisson Probability Distribution are that the occurrences must be random, independent, and uniformly distributed over the interval.

Continuous Probability Distributions

Density Curve: The graph of any continuous probability distribution. The total area under any density curve is equal to 1. The shaded area under a density curve represents the probability that an event will occur.

Uniform Distribution (Fast Food Waiting Time).png

Uniform Probability Distributions: Distributions where the random variable's values are equally distributed over the range of possible values.  The area of a portion of a uniform probability distribution can be found by using the area formula for a rectangle.

Normal Distribution II.png
Normal Distribution III.png
Normal Distribution Example.png
Normal Distribution Example II.png
Normal Distribution Example III.png
normal distribution example IV.png

Standard Normal Probability Distributions: Distributions with a bell-shaped curve, a mean of zero, and a standard deviation of one. A z score table can be used to find the area under a normal distribution or its corresponding z score value.

nonstandard normal distribution.png

Nonstandard normal distributions can be converted into standard normal distributions by utilizing the z score formula.

Critical Values.png

Critical Values: z scores that border significantly high or low z scores. 

Estimators

Estimators.png
Proportions.png

Proportion (p̂): The ratio of the number of successes to the sample or population size.

Unbiased Estimators: Statistics that target the value of their corresponding parameters. Unbiased estimators include proportion, mean, and variance.

Biased Estimators: Statistics that do not target the value of their corresponding parameters. Biased estimators include median, range, and standard deviation.

Sample Mean Distribution: When samples of the same size are taken from the same population, the sample means tend to be normally distributed. The mean of the sample means equals the population mean.

Sample Proportion Distribution: When samples of the same size are taken from the same population, the sample proportions tend to be normally distributed. The mean of the sample proportions equals the population proportion.

Sample Variance Distribution: When samples of the same size are taken from the same population, the distribution of sample variances tends to be skewed to the right. The mean of the sample variances equals the population variance.

Probability of Sample Mean.png

The probability of a sample mean that is normally distributed can be found using these equations.

Central Limit Theorem: For all samples of the same size n, where n > 30, the sampling distribution of the sample mean can be estimated by a normal distribution with mean (μ) and standard deviation (σ / √n).

Estimating The Population Proportion

Point Estimate: A single value used to estimate a population parameter.

The best point estimate of the population proportion (p) is the sample proportion (p̂).

Critical Values Confidence Levels.png

Confidence Level: The probability that the confidence interval contains the population parameter. 

Margin of Error.png
critical values confidence levels example 2.png

Margin of Error: The amount by which the sample statistic misses its corresponding population parameter.

Confidence Interval.png
Confidence Interval Alt Form.png

Confidence Interval: A range of values used to estimate the true value of a population parameter. The confidence interval limits for (p) should be rounded to three significant figures. 

Required Sample Size.png

Sample Size: The number of units that have to be collected to estimate some population parameter. The required sample size should always be rounded up to the nearest whole number.

Estimating The Population Mean

The best point estimate of the population mean (μ) is the sample mean (X̄).

Student t distribution.png
t formula.png

Student t Distribution: A distribution with a bell shape similar to a normal distribution; however, a t distribution has more variability than a normal distribution. A t distribution has a standard deviation that is greater than one and a mean of 0. As the sample size increases, the t distribution becomes more similar to a normal distribution.

Confidence Interval Mean.png
student t distribution example.png
Confidence Interval Mean II.png

The formulas used to determine a confidence interval for a population mean depend on the information provided. If a question includes information that doesn't fit any of the previously listed conditions, technology is required.

Required Sample Size Mean.png

The required sample size should always be rounded up to the nearest whole number.

Estimating The Population Standard Deviation and Variance

The best point estimate of the population variance (σ^2) is the sample variance (s^2). The sample standard deviation (s) is commonly used as a point estimate of the population standard deviation (σ), even though it is a biased estimator.

Chi-Square Distribution.png
chi-square formula.png

Chi-square Distribution: The values of a chi-square distribution cannot be negative. The distribution approaches a normal distribution as the number of degrees of freedom increases.

Critical Value Variance.png

For chi-square distributions, there are two critical values, left and right, associated with a given confidence level.

Confidence Interval Variance and Standard Deviation.png

If the original data set is available, the confidence interval limits should be rounded to one more decimal place than the values in the original data set. If the original data set is not available, the confidence interval should be rounded to contain the same number of decimal places as the standard deviation (s). 

Hypothesis Testing

Hypothesis: A claim about a property of a population.

Null and Alternative Hypotheses.png

Null Hypothesis (H0): A statement that the value of a population parameter is equal to a claimed value.

Alternate Hypothesis (H1): A statement that the value of a population parameter differs from the null hypothesis. The alternate hypothesis uses one of these symbols: < , > , ≠.

Significance Level.png

Significance Level α: The probability of wrongly rejecting the null hypothesis when it is true.

Test Statistics.png

Test Statistics: A value used to judge the null hypothesis. Test statistics are found by converting a sample statistic to a score.

Rejecting or Failing to Reject the Null Hypothesis.png
Formulating a Conclusion.png

The P-value Method and the Critical Value Method are two methods used to test a hypothesis.

Critical Region.png
Critical Z Value.png

Critical Region: The area corresponding to all values of the test statistic that cause one to reject the null hypothesis.

P-Values.png
P-Value Examples.png

P-Value: The probability of receiving a test statistic at least as extreme as the one found from the sample data.

Error Types.png

Two different types of errors can occur when deciding to reject or accept a null hypothesis.

Testing a claim proportion.png
Testing a claim mean.png
Testing a claim standard deviation.png

Hypothesis Testing (Two Samples)

Test Statistic for Two Proportions.png
Two Proportions Example.png

The P-value Method and Critical Value Method can be used to test a claim about two proportions.

Confidence Interval Two Proportions.png

A confidence interval can be used to estimate the difference between two population proportions (p1 - p2). If the interval does not contain zero, there is evidence that p1 and p2 are different values. Thus, the null hypothesis should be rejected.

Test Statistic for Two Mean (I).png
Degrees of Freedom (Two Mean).png
Two Means Example.png

The P-value Method and Critical Value Method can be used to test a claim about two means.

Confidence Interval Two Means.png

A confidence interval can be used to estimate the difference between two population means (μ1 - μ2). If the interval does not contain zero, there is evidence that μ1 and μ2 are different values. Thus, the null hypothesis should be rejected.

Two Means Alt.png

If the population standard deviations are known, alternative equations should be used.

Test Statistic for Matched Pairs.png
Matched Pair Example.png

Matched Pairs: Two dependent samples that have values that are matched according to some relationship.

The P-value Method and Critical Value Method can be used to test if there is a significant difference between measured values and reported values.

Confidence Interval Matched Pairs.png

​A confidence interval can be used to estimate μd. If the interval contains zero, it is possible that the mean of the differences is zero, indicating no significant difference between the measured values and the reported values.

Scatterplots

Positive Correlation.png
No Correlation.png
Negative Correlation.png
Nonlinear Relationship.png

Scatterplot: A plot of paired (x , y) data with a horizontal (x) and vertical (y) axis.

x : The independent variable.

y : The dependent variable.

Correlation: When a relationship exists between the x variable and the y variable. Correlation does not imply causality.

Linear Correlation: A correlation in which the relationship between the x and y variables can be approximated by a straight line.

Linear Correlation Coefficient ( ): The strength of the linear relationship between the x and y variables. r values range from -1 to 1.

RC Example.png
Scatterplot RC Example.png

Regression Line: The straight line that best fits the scatterplot.

Coefficient of Determination (R^2): The correlation coefficient squared. Values range from 0 to 1.

Critical Values of r.png
Correlation Check.png

Correlation can be checked using a critical values table for the correlation coefficient r.

Residual Equation.png

Residuals: The vertical distances between the original data points and the regression line.

Least-Squares Property: Applies to the regression line used for a given scatterplot. A line satisfies the Least-Squares Property if the sum of the squares of the residuals is the smallest sum possible. If a different line were used as the regression line, the sum of the squared residuals would be greater.

Influential Point.png
Outlier.png

Influential Point: A point that strongly affects the regression line.

Outlier: A point that is far away from the other points.

bottom of page