## An Introduction to Statistical Concepts

Most of these functions are related to a general type of function, which is called normal.

## An Introduction to Statistical Concepts by Richard G. Lomax

The "normal distribution" is important because in most cases, it well approximates the function that was introduced in the previous paragraph for a detailed illustration, see Are All Test Statistics Normally Distributed? The distribution of many test statistics is normal or follows some form that can be derived from the normal distribution. In this sense, philosophically speaking, the normal distribution represents one of the empirically verified elementary "truths about the general nature of reality," and its status can be compared to the one of fundamental laws of natural sciences.

Standardized value means that a value is expressed in terms of its difference from the mean, divided by the standard deviation.

### Theory and application

The animation below shows the tail area associated with other Z values. Recall the example discussed above, where pairs of samples of males and females were drawn from a population in which the average value of WCC in males and females was exactly the same.

Although the most likely outcome of such experiments one pair of samples per experiment was that the difference between the average WCC in males and females in each pair is close to zero, from time to time, a pair of samples will be drawn where the difference between males and females is quite different from 0. How often does it happen? If the sample size is large enough, the results of such replications are "normally distributed" this important principle is explained and illustrated in the next paragraph and, thus, knowing the shape of the normal curve, we can precisely calculate the probability of obtaining "by chance" outcomes representing various levels of deviation from the hypothetical population mean of 0.

If such a calculated probability is so low that it meets the previously accepted criterion of statistical significance, then we have only one choice: conclude that our result gives a better approximation of what is going on in the population than the "null hypothesis" remember that the null hypothesis was considered only for "technical reasons" as a benchmark against which our empirical result was evaluated.

Note that this entire reasoning is based on the assumption that the shape of the distribution of those "replications" technically, the "sampling distribution" is normal. This assumption is discussed in the next paragraph. Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal , such as t , F , or Chi-square.

Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called "normality assumption. In such cases, we have two general choices. First, we can use some alternative "nonparametric" test or so-called "distribution-free test" see, Nonparametrics ; but this is often inconvenient because such tests are typically less powerful and less flexible in terms of types of conclusions that they can provide. Alternatively, in many cases we can still use the normal distribution-based test if we only make sure that the size of our samples is large enough.

The latter option is based on an extremely important principle that is largely responsible for the popularity of tests that are based on the normal function. Namely, as the sample size increases, the shape of the sampling distribution i. This principle is illustrated in the following animation showing a series of sampling distributions created with gradually increasing sample sizes of: 2, 5, 10, 15, and 30 using a variable that is clearly non-normal in the population, that is, the distribution of its values is clearly skewed.

However, as the sample size of samples used to create the sampling distribution of the mean increases, the shape of the sampling distribution becomes normal. Although many of the statements made in the preceding paragraphs can be proven mathematically, some of them do not have theoretical proof and can be demonstrated only empirically, via so-called Monte-Carlo experiments. In these experiments, large numbers of samples are generated by a computer following predesigned specifications, and the results from such samples are analyzed using a variety of tests.

This way we can empirically evaluate the type and magnitude of errors or biases to which we are exposed when certain theoretical assumptions of the tests we are using are not met by our data.

### See a Problem?

Specifically, Monte-Carlo studies were used extensively with normal distribution-based tests to determine how sensitive they are to violations of the assumption of normal distribution of the analyzed variables in the population. The general conclusion from these studies is that the consequences of such violations are less severe than previously thought. Although these conclusions should not entirely discourage anyone from being concerned about the normality assumption, they have increased the overall popularity of the distribution-dependent statistical tests in all areas of research.

Products Solutions Buy Trials Support. Textbook Elementary Statistics Concepts. Generalized Linear Mod. General Regression Mod. Graphical Techniques Ind.

## An introduction to statistical concepts for education and behavioral sciences

Experimental Research Most empirical research belongs clearly to one of these two general categories. To index Dependent vs. Independent Variables Independent variables are those that are manipulated whereas dependent variables are only measured or registered.

To index Measurement Scales Variables differ in how well they can be measured, i. Nominal variables allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. Typical examples of nominal variables are gender , race , color , city , etc.

Ordinal variables allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more. Also, this very distinction between nominal, ordinal, and interval scales itself represents a good example of an ordinal variable. For example, we can say that nominal measurement provides less information than ordinal measurement, but we cannot say "how much less" or how this difference compares to the difference between ordinal and interval scales.

Interval variables allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees, and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees.

Ratio variables are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus, they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not only can we say that a temperature of degrees is higher than one of degrees, we can correctly state that it is twice as high.

Interval scales do not have the ratio property.

Elementary Statistics Review 1 - Basic Concepts

Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales. To index Relations between Variables Regardless of their type, two or more variables are related if, in a sample of observations, the values of those variables are distributed in a consistent manner.

To index Why Relations between Variables are Important Generally speaking, the ultimate goal of every research or scientific analysis is to find relations between variables. To index Two Basic Features of Every Relation between Variables The two most elementary formal properties of every relation between variables are the relation's a magnitude or "size" and b its reliability or "truthfulness". Magnitude or "size". The magnitude is much easier to understand and measure than the reliability. For example, if every male in our sample was found to have a higher WCC than any female in the sample, we could say that the magnitude of the relation between the two variables Gender and WCC is very high in our sample.

• What is Kobo Super Points??
• Immunopharmacology: Principles and Perspectives.
• Using the Asus Eee PC?

In other words, we could predict one based on the other at least among the members of our sample. Reliability or "truthfulness". The reliability of a relation is a much less intuitive concept, but still extremely important. It pertains to the "representativeness" of the result found in our specific sample for the entire population. In other words, it says how probable it is that a similar relation would be found if the experiment was replicated with other samples drawn from the same population.

Remember that we are almost never "ultimately" interested only in what is going on in our sample; we are interested in the sample only to the extent it can provide information about the population. If our study meets some specific criteria to be mentioned later , then the reliability of a relation between variables observed in our sample can be quantitatively estimated and represented using a standard measure technically called p-value or statistical significance level, see the next paragraph. To index What is "Statistical Significance" p-value? To index How to Determine that a Result is "Really" Significant There is no way to avoid arbitrariness in the final decision as to what level of significance will be treated as really "significant. To index Statistical Significance and the Number of Analyses Performed Needless to say, the more analyses you perform on a data set, the more results will meet "by chance" the conventional significance level.

We are currently working on an assessment initiative of our curriculum, other randomization curriculums, and more traditional introductory statistics curriculums. This assessment will include pre- and post- concepts and attitudes assessments as well as common exam questions.