Confidence Interval Estimation – Discussion When the results of a survey or poll
ID: 3297464 • Letter: C
Question
Confidence Interval Estimation – Discussion
When the results of a survey or poll are published, the sample size and the margin of error are both given. For example: 1000 voters were surveyed and 39±2% of the voters agree with the president. In this example N=1000 and the margin of error (MoE) is 2%.
This website lists several public opinion polls. Search the site and find a poll where the sample size and margin of error are given. Try to find a poll dealing with a topic in your profession or one in which you are really interested. (Business owner for a cleaning company).
http://www.pollingreport.com
Determine the following information for the selected poll results and include in your initial post:
URL for the website. State the poll question, the sample size N, and the margin of error (also known as sampling error).
Interpret the results of your poll using your own words and full sentences. Note: Depending on the question asked, your poll may have more than one poll result – you only need to discuss one result.
State the confidence interval using the MoE. What does this confidence interval estimate?
Use this worksheet (click to download CIE Proportion.xlsx) to calculate the confidence interval based on the sample size and the number of successes (the proportion you are interested in). Use a confidence level of 95%.
What is the calculated confidence interval? How does this compare to the interval in part 3 using the poll’s MoE?
Discuss potential biases that could skew sampling results.
Explanation / Answer
Whenever we see a poll, we see a margin of error, or confidence interval. These are always wrong. They are wrong, even if there are only two candidates, and they are even more wrong if there are more than two candidates. But they are simple.
The truth is complicated.
This complication exists even if we assume that the sample is a perfectly random sample of the population of voters. This assumption is ludicrous, but without it, things get really hairy. In fact, the truth is more complicated than this diary makes it out to be
If you have only two candidates then the results follow what is known as a binomial distribution. If you have more than two they follow what is known as a multinomial distribution. "Distribution" is itself a statistical term. It means an assignment of probability to each possible outcome; in this case, the proportion of the vote a candidate will get. In sampling, we try to estimate a population distribution from a sample distribution. Of course, our estimate isn't perfect, but, again assuming it's random, we can estimate how badly off it might be.
There are a few problems with the way margins of error (MoE) are usually presented in polls.
First, we interpret them wrongly. Even if we used the right MoE (see below) our interpretation is off. A confidence interval (CI) is given by the estimate plus or minus the MoE. The correct interpretation of a 95% confidence interval is that, if the population value was X, 95% of the time, the sample value would be in the 95%CI. What we usually assume is that, since the sample estimate is XXX, we can be 95% sure that the population value is within the 95% CI. That's wrong. This interpretation is VERY common; I've even fallen into it myself.
A second wrong interpretation is that we assume either a) That all values within the CI are equally likely or b) That values outside the CI are impossible. Neither is correct. If our poll estimates that 52% will vote for Joe Shmo, then the most likely result is 52%; the farther you go from 52%, the less likely. The likelihood of any particular result is given by the likelihood function - and ANY result from 0 to 100 is possible, it's just that when you get far from 52%, they are very unlikely. (You COULD flip a fair coin 100 times and get 100 heads; it's not LIKELY, but it's POSSIBLE).
But we also give the wrong MoE, because we give a single MoE for each poll, and that's not right. The classical formula for a 95% MoE is
1.96*(pq/n)^.5,
where p is the proportion saying something, q = 1-p and n is sample size.
This is approximately accurate, and the approximation is pretty good for results from polls where n is usually pretty big and we aren't interested in very rare events. It doesn't work well for estimating very rare things, like prevalence of rare diseases, but it's OK for polls. But it gives a different MoE for each candidate. But when there are two candidates who get all (or almost all) of the votes, then this difference doesn't matter too much. For example, if we poll 400 people and 60% say they will vote for Obama, 35% for Bachmann (should she be the Repub. nominee) and 5% for someone else, then the MoE for these three are
Obama 4.88%
Bachmann 4.78%
But the pollsters like to give ONE MoE, so they use an even simpler formula:
0.98/n^.5; this is only exactly correct if p = .5
For the above, it would give
Obama 4.9%
Bachmann 4.9%
not far off.
But what if we are polling a primary? A recent Iowa poll of 500 Repubs gave these results
Bachmann 25%
Romney 21%
Pawlenty 9%
Cain 9%
Paul 6%
Gingrich 4%
Santorum 2%
Huntsman 1%
It said the MoE was 4.4%; that uses the simple formula .98/n^.5. But the right ones, with the formula 1.96*(pq/n)^.5 are different for each candidate and they are
Bachmann 3.8%
Romney 3.6%
Pawlenty 2.5%
Cain 2.5%
Paul 2.1%
Gingrich 1.7%
Santorum 1.2%
Huntsman 0.9%
There are still problems with Huntsman's, but these are much more reasonable figures. They are asymptotically accurate.