Q1: Is some sample mean significantly different from what would be expected given some population distribution?
Example: say that I sampled 25 undergraduate students at Scarborough and measured their IQ, finding a mean of 115. Does this mean seem especially large given that the population has an average IQ of 100?
Q2: Is the mean of one group significantly different from the mean of some other group?
Example: recall the "paper airplane" memory experiment where I asked people within three groups to estimate the speed of a car involved in an accident and, across groups, I varied the adjective used to described the collision (smashed vs. ran into vs. contacted). Did my manipulation affect speed estimates? That is, are the mean speed estimates of the various groups different?
Note: these tests are used in conjunction with continuous (i.e., measurement) data, not categorical data.
Q1: Is some sample mean significantly different from what would be expected given some population distribution?
On the face of it, this question should remind you of your previous fun with zscores
In the case of zscores, we asked whether some observation was significantly different from some sample mean
In the case of the question above, we are asking whether some sample mean is significantly different from some population mean
Despite this apparent similarity, the questions are different because the sampling distribution of the mean (the t distribution) is different from the sampling distribution of observations the z distribution).
In order to understand the distinction between the z and ttests, we need to understand the Central Limit Theorem ...
CLT: Given a population with mean and variance , the sampling distribution of the mean (the distribution of sample means) will have a mean equal to ((i.e., ), a variance equal to , and a standard deviation equal to . The distribution will approach the normal distribution as N, the sample size, increases.
Steve will now use a sexy computer demo to illustrate this theorem and its relevance to asking questions about sample means
Although it is seldom the case, sometimes we know the variance (as well as the mean) of the population distribution of interest
In such cases, we can do a revised version of the ztest that takes into account the CLT
Specifically:
With this formula, we can answer questions like the following:
Example: say that I sampled 25 undergraduate students at Scarborough and measured their IQ, finding a mean of 105. Is this mean significantly different from the population which has a mean IQ of 100 and a standard deviation of 10?
Unfortunately, it is very rare that we know the population standard deviation
Instead we must use the sample standard deviation, s, to estimate
However, there is a hitch to this. While s^{2} is an unbiased estimator of (i.e., the mean of the sampling distribution of s^{2} equals ), the sampling distribution of s^{2} is positive skewed
Sample Variance (s^{2})
This means that any individual s^{2} chosen from the sampling distribution of s^{2} will tend to underestimate
Thus, if we used the formula that we used when was known, we would tend to get z values that were larger than they should be, leading to too many significant results
The solution? Use the same formula (modified to use s instead of ), find its distribution under H_{0}, then use that distribution for doing hypothesis testing
The result:
When a tvalue is calculated in this manner, it is evaluated using the ttable (p. 648 of the text) and the row for N1 degrees of freedom
So, with all this in hand, we can now answer questions of the following type ...
Example:
Let's say that the average human who has reached maturity is 68" tall. I'm curious whether the average height of our class differs from this population mean. So, I measure the height of the 100 people who come to class one day, and get a mean 70" and a standard deviation of 5". What can I conclude?
in we look at the ttable, we find the critical tvalue for alpha=.05 and 99 (N1) degrees of freedom is 1.984
Since the t_{obt} > t_{crit}, we reject H_{0}
In many studies, we test the same subject on multiple sessions or in different test conditions
=> sexist profs example
We then wish to compare the means across these sessions or test conditions
This type of situation is referred to as a pairwise or matched samples (or within subjects) design, and it must be used anytime different data points cannot be assumed to be independent
As you are about to see, the ttest used in this situation is basically identical to the ttest discussed in the previous section, once the data has been transformed to provide difference scores
Assume we have some measure of rudeness and we then measure 10 profs rudeness index; once when the offending TA is male, and once when they are female


















































Question becomes, is the average difference score significantly different from 0?
So, when we do the math:
The critical t with alpha equal .05 (twotailed) and 9 (N1) degrees of freedom is 2.262
Since t_{obt} is not greater than t_{crit}, we can not reject H_{0}
Thus, we have no evidence that the profs rudeness is difference across TAs of different genders
Another common situation is one where we have two of more groups composed of independent observations
That is, each subject is in only one group and there is no reason to believe that knowing about one subjects performance in one of the groups would tell you anything about another subjects performance in one of the other groups
In this situation we are said to have independent samples or, as it is sometimes called, a between subjects design
Example: study to examine external biases of memory (i.e., when I threw all the paper airplanes around).
Given Steve's accident story, about how fast (in km/h) do you think the grey car was going when it ________ the side of the red car?












There are, in fact, three different ttests we can perform in this situation, comparing groups 1 &2, 1&3, or 2&3.
For demonstration purposes, let's only worry about groups 1 & 2 for now.
So, we could ask, do subjects in Group 1 give different estimates of the grey car's speed than subjects in Group 2?
When testing a difference between two independent means, we must once again think about the sampling distribution associated with H_{0}
If we assume the means come from separate populations, we could simultaneously draw samples from each population and calculate the mean of each sample.
If we repeat this process a number of times, we could generate sampling distributions of the mean of each population, and a sampling distribution of the difference of the two means.
If we actually did this, we would find that the sampling distribution of the difference would have a variance equal to the sum of the two population variances.
Now recall that when we performed a ttest in the situation where the population standard deviation was unknown, we used the formula:
Given all of the above, we can now alter this formula in a way that will allow us to use it in the independent means example
Specifically, instead of comparing a single sample mean with some mean, we want to see if the difference between two sample means equals zero
Thus the numerator (top part) will change to:
and, because the standard error associated with the difference between two means is the sum of each mean's standard error (by the variance sum law), the denominator of the formula changes to:
Thus, the basic formula for calculating a ttest for independent samples is:
The previous formula is fine when sample sizes are equal.
However, when sample sizes are unequal, it treats both of the S^{2} as equal in terms of their ability to estimate the population variance.
Instead, it would be better to combine the s^{2} in a way that weighted them according to their respective sample sizes. This is done using the following pooled variance estimate:
Given this, the new formula for calculating an independent groups ttest is:
Note 1: Using the pooled variances version of the t formula for independent samples is no difference from using the separate variances version when sample sizes are equal. It can have a big effect, however, when sample sizes are unequal.
Note 2: As mentioned previously, the degrees of freedom associated with an independent samples t test is N_{1} + N_{2}  2
The text book has a large section on hetrogeneity of variance (pps 185193) including lots of nasty looking formulae. All I want you to know is the following:
When doing a ttest across two groups, you are assuming that the variances of the two groups are approximately equal.
If the variances look fairly different, there are tests that can be used to see if the difference is so great as to be a problem.
If the variances are different across the groups, there are ways of correcting the ttest to take the hetrogeneity in account.
In fact, ttests are often quite robust to this problem, so you don't have to worry about it too much.
Sometimes, hetrogeneity is interesting
You deserve a break today:
One Mean vs. One population mean
Population variance known:
df = N1
Population variance unknown:
df = N1
Two Means
Matched samples:
first create a difference score, then ...
df = N1
Independent samples:
df = N_{1}+N_{2}2
where:
Easy as baking a cake, right? Now for some examples of using these recipes to cook up some tasty conclusions ...
Examples
1) The population spends an average of 8 hours per day working, with a standard deviation of 1 hour. A certain researcher believes that profs work less hours than average and wants to test whether the average hours per day that profs work is different from the population. This researcher samples 10 professors and asks them how many hours they work per day, leading to the following dataset:
6, 12, 8, 15, 9, 16, 7, 6, 14, 15
perform the appropriate statistical test and state your conclusions.
2) Now answer the question again except assume the population variance is unknown
3) Does the use of examples improve memory for the concepts being taught? Joe Researcher tested this possibility by teaching 10 subjects 20 concepts each. For each subject, examples were provided to help explain 10 of the new concepts, no examples were provided for the other 10. Joe then tested his subjects memory for the concepts and recorded how many concepts, out of 10, that the subject could remember. Here are his data:

































4) Circadian rhythms suggest that young adults are at their physical peek in the early afternoon, and are at their physical low point in the early morning. Are cognitive factors affected by these rhythms? To test this question I bring subjects in to run a recognition memory experiment. Half of the subjects are run at 8 am, the other half at 2 pm. I then record their recognition memory accuracy. Here are the results:













