Typically, we are arguing either 1) that some value (or mean) is different from some other mean, or 2) that there is a relation between the values of one variable, and the values of another.

Thus, following Steve's in-class example, we typically first produce some null hypothesis (i.e., no difference or relation) and then attempt to show how improbably something is given the null hypothesis.

Just as we can plot distributions of observations, we can also plot distributions of statistics (e.g., means)

These distributions of sample statistics are called *sampling
distributions*

For example, if we consider the 48 students in my class who estimated my age as a population, their guesses have a of 30.77 and an of 4.43 ( = 19.58)

If we repeatedly sampled groups of 6 people, found the of their estimates, and then plotted the s, the distribution might look like

- some
*research hypothesis*is proposed (or**alternate hypothesis**) -**H**_{1} - the
*null hypothesis*is also proposed -**H**_{0} - the relevant sampling distribution is obtained under
the assumption that
**H**is correct_{0} - I obtain a sample representative of
**H**and calculate the relevant statistic (or observation)_{1} - Given the sampling distribution, I calculate the probability
of observing the statistic (or observation) noted in step 4,
**by chance** - On the basis of this probability, I make a decision

What I have previously called "arguing" is more
appropriately called *hypothesis testing*

Hypothesis testing normally consists of the following steps:

**H**- the guess is not really different_{0}**H**- the guess is different_{1}- obtain a sampling distribution of
**H**_{0} - calculate the probability of guessing 55, given this distribution
- Use that probability to decide whether this difference is just chance, or something more

One of the students in our class guessed my age to be 55. I think that said student was fooling around. That is, I think that guess represents something different that do the rest of the guesses

Some students new to this idea of hypothesis testing find this whole business of creating a null hypothesis and then shooting it down as a tad on the weird side, why do it that way?

This dates back to a philosopher guy named Karl Popper who claimed that it is very difficult to prove something to be true, but no so difficult to prove it to be untrue

So, it is easier to prove H** _{0}**
to be wrong, than to prove H

In fact, we never really prove H** _{1}**
to be right. That is just something we imply (similarly H

The "Steve's Age" example begun earlier is an example of a situation where we want to compare one observation to a distribution of observations

This represents the simplest hypothesis-testing situation because the sampling distribution is simply the distribution of the individual observations

Thus, in this case we can use the stuff we learned about z-scores to test hypotheses that some individual observation is either abnormally high (or abnormally low)

That is, we use our mean and standard deviation to calculate the a z-score for the critical value, then go to the tables to find the probability of observing a value as high or higher than (or as low or lower than) the one we wish to test

*Finishing the example *

= 30.77 Critical = 55

= 4.43 ( = 19.58)

From the z-table, the area of the portion of the curve above a z of 3.21 (i.e., the smaller portion) is approximately .0006

Thus, the probability of observing a score as high or higher than 55 is .0006.

- the real answer
- the "conventional" answer

It is important to realize that all our test really tells us is the probability of some event given some null hypothesis

It does not tell us whether that probability is sufficiently
small to reject H** _{0}**, that decision
is left to the experimenter

In our example, the probability is so low, that the decision is relatively easy. There is only a .06% chance that the observation of 55 fits with the other observations in the sample. Thus, we can reject H0 without much worry

But what if the probability was 10% or 5%? What probability
is small enough to reject H** _{0}**?

It turns out there are two answers to that:

- The probability level we pick as our cut-off for rejecting
H
_{0}is referred to as ouror our**rejection**level*significance level* - H
true H_{0}false_{0} - Reject H
Type I error Correct_{0} - Fail to Correct Type II error
- Reject H
_{0}__Type I Error__

First some terminology...

Any level below our rejection or significance level is
called our *rejection region *

OK, so the problem is choosing an appropriate rejection level

In doing so, we should consider the four possible situations that could occur when we're hypothesis testing

**Type I Error**

- then call something significant if it is below this level

Type I** **error is the probability of rejecting the
null hypothesis when it is really true

e.g., saying that the person who guessed I was 55 was just screwing around when, in fact, it was an honest guess just like the others

We can specify exactly what the probability of making that error was, in our example it was .06%

Usually we specify some "acceptable" level of error before running the study

This acceptable level of error is typically denoted as

Before setting some level of it is important to realize that levels of are also linked to type II errors

__Type II Error __

Type II error is the probability of failing to reject a null hypothesis that is really false

e.g., judging OJ as not guilty when he is actually guilty

The probability of making a type II error is denoted as

Unfortunately, it is impossible to precisely calculate
because we do not know the shape of the sampling distribution under H_{1}

It is possible to "approximately" measure , and we will talk a bit about that in Chapter 8

For now, it is critical to know that there is a trade-off between and , as one goes down, the other goes up

Thus, it is important to consider the situation prior to setting a significance level

__The "Conventional"
Answer __

While issues of type I versus type II error are critical in certain situations, psychology experiments are not typically among them (although they sometimes are)

As a result, psychology has adopted the standard of accepting =.05 as a conventional level of significance

It is important to note, however, that there is nothing magical about this value (although you wouldn't know it by looking at published articles)

Often, we are interested in determining if some critical difference (or relation) exists and we are not so concerned about the direction of the effect

That situation is termed *two-tailed*, meaning we
are interested in extreme scores at either tail of the distribution

Note, that when performing a two-tailed test we must only consider something significant if it falls in the bottom 2.5% or the top 2.5% of the distribution (to keep at 5%)

If we were interested in only a high or low extreme, then
we are doing a *one-tailed* or *directional* test and look only
to see if the difference is in the specific critical region encompassing
all 5% in the appropriate tail

Two-tailed tests are more common usually because either outcome would be interesting, even if only one was expected

The basics of hypothesis testing described in this chapter do not change

All that changes across chapters is the specific sampling distribution (and its associated table of values)

The critical issue will be to realize which sampling distribution is the one to use in which situation