10/4 – Methods with Ettema
How to make a causal argument (review of last week)
- We are doing variable analytic research to understand change and difference (nomothetic, we’re looking for key variables)
- Explain, Predict, and Control if we can.
The elements of explanation:
1) Covariation – relationship exists between two variables
2) Temporal Order – cause happens before the effect
3) 3rd variable problem – no other cause can be found
We want to be practical theorists.
The truth is always an argument, based on data, about how the world works; therefore,
the word “proof” doesn’t apply. The mindset is that one is close or not as close to the truth, rather than right or wrong.
Jim uses an example: He lines-up people around the room, in order of education achieved. If income is related to education, then the question posed to the group, “how much money did you make last year?” answers should follow the order we’ve already created in the room.
Probabalistic Relationships are said to exist; there will be differences, but is education and income are related in the aggregate.
But covariation isn’t enough…
Temporal Order is another factor; causes always come before effects, e.g. kids who are overweight watch more television than thin kids, but which comes first, overweight or television habits?
Longitudinal Research, or research done over time, is the way to find temporal order in this case.
Temporal order is known in some cases, e.g. gender is always sequentially first.
The kicker is the
3rd variable problem, or the elimination of all alternative arguments, e.g. customer satisfaction may be caused by speed of service, but it isn’t the only variable involved.
Jim suggests that we put ourselves in a hypothetical meeting, and be sure that any question, especially those from detractors, can be answered.
Ex.
In children, viewing violence on TV leads to aggressive behavior. There is covariation here, and temporal order, since children watch TV at age 1. But is there a third variable? It’s Parenting. Parents control TV viewing habits of kids, and their parenting style affects violent behavior. Parenting is a cause of TV viewing and aggressive behavior, so the relationship between TV viewing and violence is spurious.
Diagram:
TV Viewing< …………….> Aggressive Behavior
Parenting
- Aggressive Behavior affects Parenting
- Parenting affects Aggressive Behavior
- Parenting affects TV Viewing habits
- A weaker relationships exists between TV Viewing and Aggressive Behavior (…..)
Jim poses a question: What is the role of free will or choice, if causality can be defined in social science? Do people act mechanically? Exceptions show the role of free will, but are there explanations for exceptions?
Jim’s response: There is room for choice, so when we talk about the social world, we can call it cause, not determination. But what is the causal influence? In the case of the medical missionary, his or her parents might have been medical missionaries; he or she may have been raised in a very religious household, et al. Not a mechanistic, but a social causality.
Survey Research
Reasons to study survey research:
1) Important, often used method
2) Archetype of variable analytic research
Short definition of a survey:
“The process by which social scientists measure a number of variables, using a questionnaire administered to a sample drawn from a population at a particular point in time.”
A survey is a snapshot, and in the time during which a survey is administered, circumstances may change. Change blurs the snapshot. Timing of the snapshot is an important consideration; when is it most appropriate to take the snapshot, e.g. before or after an important event? Do we want multiple snapshots (surveys) to capture the effects of change?
The population (a.k.a. universe) is the large group about which we want to learn, e.g. “active residential accounts with one phone line”, “likely voters in the next election”.
Sampling frame enumerates the population.
“The list or quasi list of units which compose a population from which a sample is selected.” A list of customers, voters, et al is the sampling frame.
A Sample is a microcosm, or cross-section, of the population. How do you draw a sample?
Randomly.
We want the sample to enable us to make estimates of the population’s responses, e.g. "the percentage of people who’d vote for Bush in the next election is 55%." This is a
Parameter Estimate. Parameter is the statistic to be estimated, or the real number in the population.
Why does a random sample yield a cross-section? Jim uses the
marbles in an urn example. Suppose we have a large urn, and the marbles in the urn are the population we want to study. We draw a sample of marbles to estimate the color of the marbles. Some marbles are black, some are white. (We know that 60% are white, 40% are black. The parameter is 60/40) Random is defined as each unit in the population has an equal chance of being chosen. We mix the marbles, so that each has an equal chance of being selected. Marbles fall, and we have a random drawing of 10 marbles. The most common outcome would be 6 white, 4 black. The most common outcome is the population in microcosm, or cross-section. Random sampling gives us an unbiased sample of the population.
If we drew 7/3 or 5/5, Sampling Error is 10%. 9/1 = 30%, 8/2 = 20%. Sampling error can occur, but the higher the sampling error, the less likely it is to occur.
To reduce the sampling error, we can make repeated surveys, but that is too time consuming, so we can rest easy knowing that large errors are unlikely.
What if we can’t risk a large error? Increase the size of one sample, instead of take repeated samples, e.g. if we draw 10, being on marble off is more likely than being 10 off in a sample of 100.
The larger the sample, the less likely sampling error will occur.What do we have to know to determine sample size?
Margin of Error, or Confidence Interval
“Because of the possibility of sampling error, level of support is 55% in the next election, with +or – 3% margin of error, meaning 52-58% support.” What if the margin took us below 50% in an election, e.g. the level of support was 51%? The President would want a smaller margin of error, or less risk of sampling error, so we’d take a larger sample. (In political polling, + or – 3% is used)
Confidence Level
95% is the Confidence Level, or the certainty that results fall into the confidence interval. This determines our sample size given a desired confidence interval.
Statisticians determine the sampling sizes needed to meet the confidence level at given confidence intervals.
+or – 1% = 9600
+or- 3% = 1070, or 1100
+or- 5%=385, or 400
+or- 7%= 195, or 200
Where do the numbers come from? Read Moore…or trust Ettema…the urn in the example had an unspecified number of marbles. The population size matters, but not much.
The table changes as the sample size goes down from a million, not up. What if the population is smaller, say 100,000? At the +or-5% level, we need 382, at 50,000 we need 381, at 10,000, 369, at 5000, 356, at 2500, 333, at 1000, 277, at 500, 217, at 250, 151, at 200, 131, and at 100, 79. With smaller numbers, we are practically taking a census.
Kinds of Survey Errors
1)
Sampling Error
2)
Misspecification – of the population, it must be enumerated, e.g. out of date customer list. This is a big problem in internet research, because some who have email don’t use it.
3)
Error due to Nonresponse – when less than 100% respond, the subset that responded isn’t the same as the entire set, e.g. people who respond may like you. Adding to sample size doesn’t help; the only way to fix is to increase the response rate. A survey with a response rate that is less than 50% is what Babbie calls “garbage”. Telephone surveys give the best response rates, next to door-to-door surveys, which are prohibitively expensive. Nasty error because we don’t know the identity of the non-respondents. With time, there are ways to determine who isn’t responding, e.g., women are over represented, low response rates indicate that people with conservatives views have been represented, a greater number of liberals are represented with higher response rates. “Weighting the sample” extrapolation is done based on this knowledge, but it’s for the pros. The way to solve is to compare to the population, e.g. 48/52% men and women, 13% African Americans, et al.
4)
Measurement Error or False Answers - (bad questions) – How do we ask good questions? We’ll find out next week.
New York Times Handout about Presidential Approval Rating.