workbook answers

Royce A. Singleton, Jr.
Professor of Sociology

Answers to Workbook Problems

Chapter 2

The Nature of Science

I. Matching Key Terms

II. Problems and Exercises

     1a. Inductive reasoning.
       b. Deductive reasoning.
       c. Inductive reasoning.

Chapter 3

Elements of Research Design

I. Matching Key Terms

II. Problems and Exercises

1a. The unit of analysis is cities. Note that the variables are the incidence of social protest (presumably the number of protests in a given time period) and the size ("large" or "small") of union memberships (presumably and more precisely, the number of city residents who are union members). To identify the unit of analysis, ask yourself, "What do these variables describe?"

b. Because the unit of analysis is cities, it is inappropriate to draw definitive conclusions about individual behavior. According to the ecological fallacy, it is possible to draw erroneous inferences about individual behavior on the basis of data that pertain to social groupings. Without specific information about which individuals are engaging in social protest, we cannot know whether union or nonunion members are more likely to be among the protesters. The correlation could occur for many reasons; perhaps cities with large union memberships and a high incidence of social protest are bigger or older or more industrialized than cities with small union memberships and a low incidence of social protest.

2a. The variables (number of people, average age, average value of consumer goods) describe villages.

b. Because villages are the units of analysis, it is inappropriate to draw definitive conclusions about a relationship among individuals. To draw conclusions about individuals on the basis of data that pertain to social groupings is an ecological fallacy. One cannot tell from these data precisely who is buying the consumer goods. Perhaps villages with high proportions of children also have a high proportion of wealthy older people who are apt to purchase various consumer goods. Without specific information about which individuals are making the purchases, we cannot know if younger people are more likely than older people to be the buyers.

     3a. unit of analysis: individuals (boys)
           independent variable: whether a boy’s parents are divorced or separated or living together
           dependent variable: number of behavioral problems

       b. unit of analysis: states
           independent variable: number of (or dollar value of) sales of sexually explicit magazines
           dependent variable: rape rates

       c. unit of analysis: individuals
           independent variable: rural or urban residence (i.e., whether the individual is a resident of a rural or urban community)
           dependent variable: level of tolerance of people holding controversial views

       d. unit of analysis: country
           independent variable: level of economic development
           dependent variable: level of human services provided

4. None of the four statements conveys a testable relationship between variables. The criterion of testability implies a comparison of units of analysis on two or more variables. If a statement describes characteristics that do not vary, does not establish a basis of comparison, or does not indicate how one variable is related to another, then it is not a testable hypothesis.

a. "Owning a gun" suggests the variable "gun ownership" or, more precisely, "whether or not one owns a gun." The word "danger" is ambiguous, although it clearly implies risk of personal harm. To observe this phenomenon, we would have to find something tangible—a "proxy variable"—with which to represent it, such as homicide or other violent crime. Let us assume that "danger" implies the incidence of homicide. We could then state the relationship in various ways.

People who own guns are more likely to be the victims of homicide than people who do not own guns.
A person who owns a gun is at greater risk of homicide than a person who does not own a gun.
A higher percentage of gun owners are victims of homicide than are non-gun owners.
The greater the percentage of gun owners in a community, the higher the rate of violent crime. (Notice how the unit of analysis differs for this statement.)

b. This statement implies three variables: employment status (employed versus unemployed), level of mental health, and gender. Gender is a control variable because the relation between employment status and mental health applies only to men. That is, gender does not vary in the hypothesis. Employment status is a qualitative variable; however, level of mental health could be construed as a qualitative variable (e.g., whether or not one suffers from mental health problems) or quantitative variable (level of mental health on some numerical scale). Therefore, the relation may be stated in several ways, such as:

Among men, the unemployed are more likely to suffer from mental health problems than the employed.
Unemployed men have more mental health problems than employed men.
If a man is unemployed, he is likely to have mental health problems; if a man is employed, he is not likely to have mental health problems.
Unemployed men are less mentally healthy than employed men.

If you use "length of unemployment" as a proxy for "unemployment," the hypothesis becomes:

The longer a man is unemployed, the more likely that he will suffer from mental health problems.

Finally, it is also possible to rewrite this statement as a hypothesis about social units of analysis, such as a cities or nations. If we use "unemployment rate" as a proxy for "unemployment" and "rate of mental illness" as a proxy for mental health, then we can state the following hypothesis:

The higher a nation’s unemployment rate, the higher its rate of mental illness.

c. The concepts in the statement "Catholics frequently attend church"—"Catholics" and "frequently attend church"—do not vary and do not form a basis of comparison. This statement does suggest two variables: Catholic versus non-Catholic (or religious affiliation) and frequency of church attendance. Now, rewriting the original statement in terms of these variables, we can satisfy the criterion of testability. For example:

Catholics attend church more often than non-Catholics.
Catholics attend church more often than members of other religious groups.
Catholics are likely to attend church more often than non-Catholics.

d. This statement implies the variables amount of study time and grade-point average (GPA) or course grade. Both grade average and study time are quantitative variables, so it is best to use a continuous statement to convey the hypothesis. For example,

The more time one studies, the higher his or her grades will be.
Study time is directly related to grade-point average.
As study time increases, grade-point average increases.

General comments. Three common mistakes on this exercise are (1) the failure to specify exactly how the variables are related, (2) incomplete hypothesis statements, and (3) the inappropriate or erroneously labeling of "variables." It is not enough merely to say that number of hours of studying affects grades or to ask a question about how study time affects grades. Hypotheses take the form of declarative statements that indicate exactly how the variables are related. Nor is it sufficient to say simply that students who do not study get low grades. This does not specify the overall relationship between study time and grade average. Also, variables are characteristics or values that vary. "Male" is not a variable; it does not vary. "Male" is a category of the variable "gender." "Unemployment" also is not a variable for the same reason. Employment status can vary from being employed full-time to not being employed at all, but "unemployment" does not imply variation: it signifies a single category of the variable employment status. Similarly, "Catholic" is a category of the variable religious affiliation, and "low grades" is a category of the variable GPA.

5. Because both study time and course grade or GPA can be measured quantitatively, it is possible to draw a graph depicting the hypothesized relationship. The graph should place values of the independent variable along the horizontal or X-axis and values of the dependent variable along the vertical or Y-axis. Data points in the graph should then reflect the nature of the relationship. If study time is measured by, say, number of hour studied per week, and course grade by numerical equivalents of letter grades (i.e., A=4, A-=3.7, etc.), the data points in the graph should form a linear pattern from the lower left-hand corner to the upper right-hand corner.

8a. No, we cannot conclude that a causal relationship exists, nor can we infer the direction of the relationship, based solely on evidence of a statistical association between two variables. An association by itself does not imply causation. We also need evidence about the direction of influence and evidence that the association is nonspurious.

b. There are at least three possible causal interpretations of the statistical association: (1) a low GPA leads one to smoke marijuana; (2) smoking marijuana lowers GPA; and (3) neither variable causes the other because both smoking marijuana and GPA are caused by another variable, such as intelligence or self-esteem (in other words, the relationship is spurious).

There may be, of course, other causes of smoking marijuana or other causes of GPA. The existence of other causes, however, does not refute the conclusion that one variable, say smoking marijuana, is a cause of the other—GPA. A spurious relationship is created when both variables—smoking marijuana and GPA—have a common cause. The existence of additional causes of either variable, by itself, does not demonstrate a spurious relationship. Nor could one conclude that the relationship is spurious by offering the interpretation that smoking marijuana decreases attention span or study time which, in turn, lowers one’s GPA. In fact, this interpretation identifies intervening variables—attention span and study time—which explain how smoking marijuana is a cause (albeit indirect) of GPA. In other words, the kind of extraneous variable that creates a spurious relationship must be a cause of both variables.

9. No, we cannot conclude that the size of a child’s day-care group affects the development of the child’s social and cognitive skills. First, we don’t know from the information given whether the relationship between size of day-care group and social and cognitive development is statistically significant and, therefore, whether we should assume that an association exists. But assuming that it does, as with any statistical association, we also cannot infer that one variable is a cause of the other. Although small day-care groups may foster greater development than large day-care groups, an equally plausible explanation is that the relationship is spurious. Some extraneous variable could affect both the size of the day-care group in which a child is placed and his or her social and cognitive development. For example, it is possible that parents with greater financial resources are both more likely to afford to place their child in a small day-care group and to give their child the attention and other support that foster the development of social and cognitive skills.

It is not appropriate to challenge the conclusion merely by stating that there may be other causes of a child’s social and cognitive development. The presence of other causes does not preclude the possibility that size of day-care group is a cause. As with any social phenomenon, there may be multiple causes.

Chapter 4

Measurement

I. Matching Key Terms

II. Problems and Exercises

2. This exercise suggests the following key points regarding reliability and validity.

Validity is seldom clear-cut. It is usually inappropriate, therefore, to say that an operational definition is valid or invalid. The question, rather, is a matter of how valid or invalid; that is, how well does an operational definition represent the meaning of a given concept?
Subjective measures, such as items b and d on this exercise, are not necessarily bad simply because they are subjective. Often as social scientists we are interested in the subjective—in people's opinions, beliefs, feelings, and attitudes—and often the best way to measure these phenomena is by asking people directly. Subjective measures are most problematic when we want an "objective" estimate of something, such as peoples' involvement in campus activities (item a) and the number of hours devoted to studying (item e).
A change in response to a question does not necessarily imply low reliability. Inconsistent responses per se may mean either that a question is unreliable or that there is a real change in that which the question is measuring. If a respondent reports at the beginning of the semester that she studied about 10 hours a week, but now reports that she is studying 20 hours a week, this difference most likely means that she is spending more of her time studying. If she reports these numbers of hours—first 10, then 20—but in actuality spends the same amount of time studying, this would indicate that the question is unreliable. Only changes in response that occur when the phenomenon being measured remains constant indicate unreliability.
Reliability should be tested, if possible; thus, the issue of reliability is largely an empirical question. For example, whether the categories "a great deal," "quite a bit," and "some" generate reliable responses in items a and c should not be left to investigator judgment, but should be determined by asking the question on more than one occasion to the same group. If respondents give essentially the same answers each time, the item is reliable. Validity, on the other hand, may be difficult to assess empirically and is more open to interpretation. As social scientists, our decision to include a particular question in a survey often is based on its face validity.
Remember that an unreliable measure cannot be valid, but a reliable measure may or may not be valid. In other words, reliability is a necessary but not a sufficient condition for validity.

a. This item is very ambiguous for two main reasons. First, the word "involved" may be interpreted in ways that do not correspond to the researcher's intended meaning. To some respondents, partying, drinking, studying, or "hanging out" in the dorm may represent involvement in the college community. Second, as we will learn when we study question wording in surveys (chapter 10), the response categories are very imprecise. It would be much better to provide a list of organizations and activities and ask respondents to check off all those in which they participate. It is also possible that respondents may exaggerate their level of involvement because they perceive being involved in one's college community as socially desirable or politically correct.

b. This item has been used repeatedly in the General Social Survey as a measure of fear of crime. However, its validity has been called into question because it is likely to overestimate the level of fear of crime. People may be afraid "to walk alone at night" for many reasons other than crime. They may be afraid to go out at night because of the neighbor's dog, because they simply are afraid of the dark, or because they have poor night vision and might worry about getting lost or falling down and getting hurt. It would be much better to ask respondents directly about their fear of specific crimes.

c. Item c is not only likely to be affected by personal self-serving biases, but it is also something that people simply may not be able to estimate with much accuracy. It probably would be best to obtain external sources of evaluation, such as information about memberships and offices in town government and community organizations, or to ask respondents to list the two or three most influential people in their community. The person's community power might then be operationalized as the number of memberships and offices held or as the number of times he or she is identified by others as an influential person in the community.

d. With respect to item d, some people may not be aware of their actual health status. Others may not give honest responses because they don't want to appear unhealthy. Some responses may be affected by minor variations in health due to colds or flu, so that respondents with otherwise excellent general health may indicate "fair" or "poor" health when they are suffering from a temporary ailment. Such fluctuations would lower reliability (and thereby lower validity) if the investigator were interested in general health status. The best way to measure this would be to obtain a doctor's evaluation through a physical exam. If that were not possible, then respondents could be asked about specific health problems. This question has been used frequently in actual research, but it is always construed as a measure of "subjective health status."

e. Although this item appears to be a valid measure of study time, respondents’ answers could be inaccurate due to (1) their inability to recall or estimate the number of hours they generally study or (2) their tendency to inflate their estimates so as to project a socially desirable image as serious students.

3. The ambiguity of the question and the imprecision of the response categories may create random measurement error; the tendency of respondents to exaggerate their level of involvement because they perceive this as socially desirable or politically correct would be a source of systematic measurement error.

4. The only way one could assess the reliability of item e is to ask the question twice of the same group of individuals and compare their responses (called "test-retest reliability"). The greater the correlation, presumably, the higher the reliability. Of course, this method presents problems because any difference in reported hours may represent real change in the number of hours devoted to studying rather than an unreliable measure.

The validity of item e may be assessed in several ways, of which face validity is the least satisfactory. One might ask the question of two groups who are known to differ in terms of how much time they study, such as pre-meds at Holy Cross and physical education majors at UMass. But this method would not provide good evidence of the degree of accuracy. Another way to assess accuracy, or validity, would be to compare the answers of a sample of respondents with direct observation of how they use their time. But not only would this be prohibitively expensive; direct observation is apt to change the observed person's behavior. Probably the best method of assessment would be to ask a sample of respondents to keep a time-use diary, in which they record how they spend every hour of a 24-hour day. You then could compare answers to item f with the amount of time studying recorded in the diary. If respondents kept a time diary at, say, three different times during the semester, and you took the average of the three reports, this would provide a reasonably accurate criterion for assessing the validity of the question. By the way, research has shown that people consistently overestimate the time spent on many activities, especially when the activities are deemed socially desirable.

5a. To describe the relationships in these two tables, one must know how to read the tables. In each table, percentages are calculated downward, for each category of the column variable, attendance at religious services; therefore, we must follow the rule "percentage down, read across." In other words, we must compare the strength of religion (Table 1) and frequency of prayer (Table 2) among respondents who attended church less than once a month with those who attended once a month or more. In Table 1, those who attended services less than once a month were more likely than those who attended once a month or more to say that they either had no religion or that their religious identity was not very strong (77.7% versus 25.0%). In Table 2, those who attended services less than once a month were more likely than those who attended once a month or more to say that they prayed less than once a week (41.4% versus 5.2%). In other words, frequency of church attendance is positively related to strength of religious identity and to frequency of prayer.

b. Attendance at religious services frequently is used by social scientists to measure religiosity, even though evidence seldom is offered to support its validity. As we note in Box 4.1, sociologist Ronald Johnson defines religiosity conceptually as the sense of "being religious." If attendance at religious services validly represents this quality, we would expect it to be related to other behaviors that also reflect the sense of being religious. Two such behaviors are the strength of one's religious identity and frequency of prayer. People who describe themselves as strong Catholics or Protestants or Jews should be more religious than those who say they have no religion or those with a weak religious identity. Similarly, people who pray frequently should be more religious than those who pray seldom or never. The tables show that attendance at religious services is related to both strength of religion and frequency of prayer: the more frequently respondents attended church, the stronger their religious identification and the more frequently they prayed. So, the evidence supports the validity of church attendance as a measure of religiosity.

c. Neither of the tables provides direct evidence of the reliability of respondents' answers because reliability is a matter of consistency and respondents answered the attendance question only once. On the other hand, the evidence supporting the validity of this measure indirectly supports its reliability insofar as a valid measure is necessarily reliable.

Chapter 5

Sampling

I. Matching Key Terms

II. Probems and Exercises

1a. Given that measures of readability are based on counts of words and sentences (e.g., the average number of words in a sentence), our target population would have to be all the words in all or some specified portion of the book. In other words, the unit of analysis is the textual material—words or sentences—in the book, and it is the readability of the text that we are interested in measuring. Because it would not make sense to assess the readability of the beginning (e.g., table of contents, preface) and end materials (e.g., index), I would define the target population as all words on the consecutively numbered (Arabic numeral) pages in chapters 1 - 18, from 1 to 552. It would require a great deal of time to count the number of words or sentences; however, in terms of pages, the population size is 552. [Note that target populations should be described in sufficient detail that the limits of their inclusion are clear. Also, it is incorrect to define the unit of analysis as individual people, specifically readers of Approaches to Social Research, and to identify the target population as a particular set of readers, such as students enrolled in this course. Readability is not a property of people; it is a property of text.]

b. Because it would be impractical to apply readability measures to the entire book, we need to take a sample of words and sentences. The easiest way to do this is to draw a sample of pages and then apply the readability measure(s) to all words and sentences on every page in the sample. Thus, our sampling design consists of a two-stage cluster sample, the primary sampling units are pages, and the sampling frame for the first stage is the numbered pages 1 - 552.

d. The number of odd- and even-numbered pages is, of course, unrelated to readability. The object of having you compute these statistics is simply so that you can compare them with known population parameters. Half of the pages 1 to 552 are even-numbered and half are odd-numbered; therefore, the population parameters are 50% even pages and 50% odd pages.

e. A systematic sample could be judged as either appropriate or inappropriate, depending on your rationale. You could argue that it is appropriate because it is easy to do, would provide proportionate representation of every chapter, and there is no apparent cyclical pattern to which the sampling interval might correspond and thereby bias the results. Alternatively, you could argue that it is inappropriate because (1) the target population is not very large (N = 552) and the cases (i.e., pages) are already numbered, making it easy to draw a simple random sample, or (2) the book does appear to have a cyclical pattern—at the end of each chapter is a summary, a list of key terms, and review questions/problems—and the chapters generally are similar in length. If the sampling interval happened to correspond to these end-of-chapter sections, it would be useless for assessing readability.

2. General observations.

The target population should not be confused with the sampling frame. The target population is the group to which one would like to generalize his or her results. A sampling frame is the source, usually a list of the target population or lists of sub-populations, from which the sample is selected.
Precision is an extremely important feature of social research. Researchers must describe their methods with enough precision that others may evaluate and replicate their studies. Thus, it is important to carefully spell out all the features of the recommended sampling design, and not merely indicate that stratified random or multistage cluster or some other sampling design will be used. For example, if stratified random sampling is recommended, one should describe the strata and whether they will be sampled proportionately or disproportionately; if multistage cluster sampling is used, one should spell out the various stages.
Quota sampling is not an appropriate sampling design when a complete list of the population can be obtained. If a complete list is available, some type of probability sampling design (e.g., simple random, stratified random) should be used. Quota sampling is an inferior nonprobability sampling design subject to biased selection procedures (see Box 6.1) that may be used when a sampling frame cannot be obtained or easily constructed.

a. (1) Pay, hours, and job satisfaction are variables that pertain to individuals; therefore, the unit of analysis is individual employees.

(2) An appropriate sampling frame would be a complete list of all Holy Cross employees, which one might be able to obtain from the personnel or payroll office. The College telephone directory would be a very poor sampling frame because it does not include staff members who do not have telephones (e.g., many physical plant employees).

(3) Provided that a good list is obtainable, one could easily justify simple random, systematic, or stratified random sampling. Cluster sampling is not appropriate because the population is neither large nor geographically dispersed. One could argue for systematic sampling because it is easiest to do. A strong argument also could be made for stratified random sampling, with perhaps gender and/or job type (e.g., faculty, administration, and staff) as the stratifying variables. Social researchers rarely, if ever, stratify by more than one or two variables, because it quickly becomes too cumbersome to implement the sampling design. For example, even with two variables, gender and job type, we have six strata into which the population must be grouped before the sample can be drawn. The main reason for stratifying is to increase sampling efficiency—to maximize accuracy for a fixed sample size. We also might stratify if we wanted to make sure that we had a sufficient number of cases in each category of an important variable. Thus, since administrators constitute a small proportion of the target population, we might stratify by job type in order to select enough employees in this category (or stratum). It is not necessary or appropriate to stratify, however, in order to analyze the differences between males and females or faculty versus staff versus administration. We can analyze the effects of these variables on job satisfaction no matter what type of probability sampling design we decide to use.

b. (1,2) The unit of analysis and the sampling frame would depend on how you intend to carry out your study. If one conducts an observational study of conversations in public places, which I assume one would want to do, then the unit of analysis is a conversation, and the sampling frame would have to consist of a map or list of all public places on campus. If you studied conversations in public places by asking people what they talked about—a clearly inferior way of studying behavior—then your unit of analysis would be individuals, and you might get a list of students and/or college employees for your sampling frame.

(3) Assuming that we want to observe conversations in public places, I would use area and time sampling, much like the example of sampling behavior in a mental institution described on p. 139 of chapter 6. Since all conversations occur at specified places, days, and times, I would first divide the campus into areas or list all public places where conversations are likely to take place, then draw a sample of these places, and then randomly select certain days and times for observation.

c. (1) Individuals have "attitudes"; therefore, the unit of analysis would be individuals— specifically, individuals whose religious preference is Catholic.

(2,3) For any sort of national study, it is necessary to draw a multistage cluster sample, because it is highly unlikely that a complete list of the population is available, and a complete list is necessary for simple random, systematic, or stratified random sampling. If one uses multistage cluster sampling, then there are distinct sampling frames for each stage. For example, suppose we propose to use a three-stage cluster sample, drawing a sample of dioceses at the first stage, parishes at the second stage, and individuals at the third stage. To implement this sampling design, one must obtain a complete list of all dioceses in the U.S., and draw a random sample of dioceses. For each diocese that is selected, one then must obtain a complete list of all parishes and draw a random sample of parishes. Finally, for each selected parish, one must obtain a list of all parishioners and draw a random sample of parishioners. (How could this design result in coverage error?)

Please note that the U.S. census could not be used as a sampling frame. Individual census records contain legally protected, confidential information that cannot be released to the general public until 72 years after the date of the census. For similar reasons we also could not gain access to other national lists of the U.S. population, such as the records of the IRS or Social Security Administration.

d. (1) The unit of analysis is individual residents of Worcester.

(2,3) Because persons who work night shifts comprise a relatively small and unknown percentage of the population, a sampling frame would be very difficult to construct. One strategy would be to contact organizations with night shifts (e.g., hospitals, police and fire departments, postal service, 24-hour restaurants and stores, etc.), and try to obtain lists of the employees who work at night. But it would be very difficult to construct an exhaustive list of such organizations and time-consuming to compile a sampling frame from all of the separate lists that one obtains. One option here is to use a two-stage sample, in which the first sampling frame consists of a list of organizations with night-shift employees, and the second sampling frame consists of night-shift employees in organizations sampled at the first stage.

Another strategy would be to screen the larger population for eligible respondents, in which case the sampling frame for drawing the sample is a list of the Worcester population or, if one uses random-digit dialing, a list of all telephone exchanges in the city of Worcester. In either case, each person contacted would be asked if he or she is employed during nighttime hours. With random-digit dialing, this strategy might be practicable, provided that those respondents who identify themselves as night workers comprise a large enough percentage of the larger population.

Finally, if the aims of the study do not require the precise description of population characteristics, or if the researcher plans to study a small number of cases in depth, the most practical approach would be some form of nonprobability sampling.

3a. (1) If we assume that respondent selection is random, this is a proportionate stratified random sample, with in-state and out-of-state status as the two strata. It is "proportionate" because the ratio or proportion of these two groups in the sample is the same as in the population. If selection is nonrandom, then this is a quota sample.

(2) It would be more efficient to draw respondents from a list of students who have cars on campus, perhaps a list of those to whom parking permits have been issued. This said, a quota sample would not be appropriate, because this is an inferior nonprobability sampling design that should not be used when probability sampling is feasible. A proportionate stratified random sample would be appropriate if the ratio of the two strata is about the same, or close to 50-50, especially if you suspect that there is discrimination—e.g., out-of-state students are more likely to be ticketed. This would improve the accuracy of any overall estimates regarding the issuance of parking tickets. If, however, one group is much more likely to have cars on campus than the other, it would be better to use disproportionate stratified random sampling. Of course, one need not draw a stratified random sample in order to test for discrimination against out-of-state students. You just need to be sure that the sample is sufficiently large to contain an adequate number of students in each category—in-state and out-of-state.

b. (1) If we assume that the researcher starts with a randomly selected voter from among the first 40 on the list, this is a systematic sample.

(2) If the list is very large and no computer file of the list exists, a systematic sample is far easier to draw than, and offers a very good approximation of, a simple random sample. Systematic sampling is problematic when the list is ordered in some fashion; however, this is only a problem if the sampling interval—in this case, 40—corresponds to a cyclical pattern in the list, which rarely occurs. In fact, systematic sampling from ordered lists (e.g., a list of voters ordered by their political party affiliation or the precinct where they live) would provide implicit stratification and insure that the sample is representative with respect to the ordered variable.

c. (1) This is a convenience sample. It is not a snowball sample or referral sample, as these designs call for staged selection of respondents, which is not described here.

(2) The appropriateness of a convenience sample such as this depends on the specific objectives of the study. If you were doing an intensive study of students who use nonprescription drugs, say, to learn more about their lifestyles and how drug use affects their relationships, this sample might be justifiable as a beginning. To increase sample size, however, you might want to ask the initial contacts to identify other known drug users, who might then be asked to identify others, and so on. In other words, use referral or snowball sampling. On the other hand, if your objective is to estimate the extent of drug use on campus, perhaps by means of a questionnaire survey, then convenience sampling is clearly inappropriate. It almost certainly would be biased, and it would provide no basis for estimating sampling error. For this objective, it would be better to obtain a complete list of students and use probability sampling methods to draw the sample.

Chapter 6

Experimentation

I. Matching Key Terms

Chapter 7

Experimental Design

I. Matching Key Terms

II. Problems and Exercises

1a. This is an example of the one-group pretest-posttest design: O₁ X O₂

X represents the hotline; O₁ represents the average number of hospital visits before the hotline (4.3), and O₂ represents the average number of hospital visits after the hotline was begun (1.6).

b. Several threats to internal validity, described below, are confounded with the effects of the hotline. To account fully for the observed decrease in doctor visits, any one threat would have to affect a large proportion of the 50 patients. On the other hand, it is easy to imagine how several effects (or threats), each affecting a small number of patients, collectively could account for the observed difference.

history - Perhaps the cost of doctor visits increased or free medical care was curtailed during the study, thereby inhibiting patients' visits. Also, perhaps several patients were encouraged to seek psychiatric counseling, which reduced their hypochondria.

maturation - Perhaps hypochondria is something people outgrow or a particular "stage" that people go through at some point in their lives. Thus, some people in the study may have outgrown their hypochondria.

instrumentation - This could result if a change occurred in the personnel who record visits or, more plausibly, if the hospital began using more stringent screening procedures that made it more difficult for patients with minor symptoms to see a doctor.

attrition - Perhaps some of the worst hypochondriacs were out of town for part of the study period. Or, perhaps some patients began visiting doctors at other hospitals, hence dropping out of the hospital where the test was being conducted.

statistical regression - Given that the patients selected were "extreme examples of hypochondria," their visits may have declined in the second year as a result of the tendency for extreme cases to regress toward the mean. In other words, as a consequence of measurement error in recording doctor visits (i.e., random error), the label "extreme" was erroneously applied to some patients.

Note that neither selection nor testing threatens the internal validity of this study. The absence of a comparison group makes selection irrelevant; and without a self-report pretest or without subjects' awareness of being specially selected for the hotline program, a testing effect cannot occur. On the other hand, selection-X interaction is possible. That is, the effects of the hotline may pertain only to extreme hypochondriacs. Such an interaction does not affect the internal validity of the study, because the independent variable still would affect the dependent variable; however, it does reduce external validity because it limits the groups to whom the results would apply.

2a. Selection. This study is a static group comparison. Even though non-meditators are selected randomly, the researcher has no control over who chose or chose not to practice TM. Therefore, there is no random assignment to the two groups (TM and not TM), and any differences between them in cognitive abilities may be due to selection bias.

b. History and maturation. Since six months elapsed between the beginning of the pretesting and the end of the posttesting, many other events besides the dietary change could have taken place during this period that reduced antisocial (e.g., violent and aggressive) behavior. For example, since all the juveniles who participated in the study were institutionalized for the entire period, it is possible that institutionalization itself, over time, reduced the incidence of antisocial behavior. Although somewhat less plausible, it also is possible that the juveniles might outgrow their problematic behavior as a result of normal maturation processes.

c. Selection. Even though a random sample of students was selected, this study lacks random assignment to the two groups: those who used the computer lab and those who did not. Therefore, any differences between these groups in GPA at the end of the year may have existed at the beginning—that is, before the computer lab was established.

d. (1) Differential attrition. The differential loss of subjects undermines the effectiveness of random assignment. In fact, given the motivation of subjects to participate in the study (so they might be able legally to try marijuana), it is quite possible that subjects in the placebo condition dropped out because they realized they were not going to get to smoke marijuana. Whatever their motivation, this differential loss destroys the integrity of the experimental design because the two groups are no longer equivalent. (2) Even if all subjects remained in the study, its external validity is limited by the possibility of selection-X (or selection-treatment) interaction. That is, the effects may be restricted to volunteers who want legally to smoke marijuana.

e. Testing-X (or testing-treatment) interaction. By giving subjects a measure of political awareness and then asking them to watch the debate, subjects may be sensitized to the object of the study. As a result, they may attempt, on their own, to become more knowledgable about current issues before the debate is televised, or they simply may be more in tune to such isssues when they watch it, which may in turn enhance the impact of the debate on their "political awareness." In other words, the effects of the debate may be limited to subjects whose sensitivity to political issues has been aroused by testing their political awareness.

3a. This is a 2 (low vs. high self-esteem) x 2 (low vs. high physical attractiveness) factorial design.

Note that this is neither the Solomon Four-Group Design nor one of the three pre-experimental designs described in chapter 8. Unlike the Solomon Four-Group Design, the Kiesler and Baral study does not involve a pretest (more precisely, the measurement of romantic interest prior to the manipulation of the independent variables). And if we assume that subjects were assigned randomly to each condition (low self-esteem, low physical attractiveness; low self-esteem, high physical attractiveness; high self-esteem, low physical attractiveness; and high self-esteem, high physical attractiveness), then this has all the features of a true experiment. We would have to expand the notational system somewhat in order to represent this design, because there is more than one independent variable. If we let X = self-esteem and Z = physical attractiveness, and let different subscripts stand for each level of the independent variable, then this design could be depicted as follows, where O represents measurement of romantic interest:

          X₁   Z₁   O₁
          X₁   Z₂   O₂
          X₂   Z₁   O₃
          X₂   Z₂   O₄

b. The outcome of this experiment is an interaction effect. The graph should show two nonparallel lines that run in opposite directions, one from low to high and the other from high to low. The dependent variable is always placed along the vertical or Y-axis; one of the two independent variables should be placed along the horizontal or X-axis; and the two lines in the graph should be labeled appropriately to represent the levels of the other independent variable. If you place self-esteem along the X-axis, then the lines in the graph would represent low and high attractiveness. If you place attractiveness along the X-axis, then the lines in the graph would represent low and high self-esteem.

Whether the lines actually cross in the graph as well as the exact positioning of the lines in the graph is unimportant. An example of a figure that would represent the outcome reported here is Figure 8.2.C in the text. But note that not all interaction effects look like Figure 8.2.C. For example, Figures 8.1, 8.2.A, and 8.2.B in the text, while showing interaction effects, do not correspond to the interaction effect found in the Kiesler and Baral study.

c. (1) If the variable of self-esteem is placed on the X-axis, then a main effect for self-esteem only would be represented by a single line (the two attractiveness conditions superimposed on one another) running from low to high. (2) If we assume no interaction, then Figure 8.2.D in the text shows the outcome in which both self-esteem and attractiveness had main effects.

Chapter 8

Survey Research

I. Matching Key Terms

Chapter 9

Survey Instrumentation

I. Matching Key Terms

II. Problems and Exercises

1. (1) a. Inappropriate technical vocabulary. Many respondents will not know the meaning of "psychotropic."

b. Double-barreled question. This question asks about both (1) family approval of wine and (2) family opposition to other alcoholic beverages. (One might also argue that the word "family" is ambiguous. To whom does this refer? Parents, grandparents, brothers and sisters, all of these? Finally, the question may be leading, suggesting to some respondents that wine is ok, but other alcoholic beverages are not.) To rectify the double-barreled problem one should ask two or more questions: Do your parents approve or disapprove of your drinking wine? Do your parents approve or disapprove of your drinking beer? Do your parents approve or disapprove of your drinking alcoholic beverages other than wine or beer?

c. Leading question. Drunk drivers are described negatively as "thoughtless" and respondents are asked "what should be done," which may lead respondents to suggest more severe sanctions than they might recommend with more neutral wording. (Another, lesser problem is that the meaning of "too much to drink" is ambiguous.)

d. Imprecise measurement. The age intervals in the response categories will not capture much of the variation in the target population (Holy Cross students). Unless respondents are unwilling to give precise answers (as in questions about income), questions should allow for the maximum possible precision. In this case, simply ask, "What is your age?" or "What is your date of birth?"

e. Imprecise measurement. The indefinite words "sometimes," "usually," and "frequently" may have different meanings for different respondents.

(2) c. Prior to this question, the respondent should be asked or informed about the penalties for drunk driving, so as to place the question in an appropriate context and make it clearer. Given this frame of reference, the respondent might then be asked, "Do you think the current penalties for drunk driving in this state are too lenient, about right, or too severe?"

d. The response categories for this question should be precise, mutually exclusive, exhaustive, and properly ordered either from low to high or high to low. Here is one possibility:

     _____ never
     _____ less than once a year
     _____ about once or twice a year
     _____ several times a year but less than once a month
     _____ about once a month
     _____ 2-3 times a month
     _____ about once a week
     _____ more than once a week but not every day
     _____ every day

(3) c. This question should be placed in the middle of the questionnaire for several reasons: this could be a sensitive topic for many respondents; if it remains open-ended, then it will require some effort for respondents to answer; and, most importantly, if placed near the beginning, it could affect how respondents answer subsequent questions since it might give the impression that the researcher is anti-drinking.

d. As a routine background item, this question should be placed at the end.

e. Insofar as this is part of a survey of alcohol use, this should not be a sensitive or threatening question. It is consistent with respondent expectations and therefore might be placed toward the beginning. (By the way, to decrease interviewer effects, which might produce a bias toward socially desirable responses, it probably would be best to use student interviewers. As you can imagine, adult interviewers may elicit very different responses. For example, students are less apt to admit under-age drinking to an adult than to a student interviewer.)

2 (1)a. Giving blood is one of many possible measures of altruism. When considered in conjunction with several other indicators, it can help reveal the range of altruistic behavior. However, this item is poorly worded because the response categories are imprecise and fail to make a critical distinction between those who have not given blood and those who have. Because few people give blood more than once a year, there will be very little variation in responses to this question even among those who donate blood; therefore, the question will be of little or no use in measuring variation in altruistic behavior. Finally, one is likely to get more reliable responses by asking about a shorter and more recent time period, such as the last year.

b. Contributing to charitable organizations, like giving blood, is another possible indicator of altruistic behavior. But this item suffers from two major problems. First, it is double-barreled; it asks if "you" and "your family" have contributed. Second, the indefinite word "regularly" will mean different things to different respondents.

c. Subjective assessments are sometimes appropriate; however, this question is not likely to yield useful information for two reasons. First, the vocabulary is too technical; many people will not know the meaning of "altruistic." Second, it is likely to elicit socially desirable answers because it is socially acceptable to be altruistic.

d. This is a leading question. Respondents are likely to assume that "deserving" individuals warrant benefits. Moreover, attitude toward welfare benefits for the homeless does not have face validity as a measure of altruistic behavior. One may care about and even work on behalf of the homeless, but not support "welfare" payments as a solution to the problem.

(2)a. This question would work best if preceded by a filter question that asks if the respondent has donated blood. For those who have donated, then simply ask: "How many times have you given blood in the past year?"

b. Make this one question, directed to the respondent: "Do you contribute to United Way?"

Chapter 10

Field Research

I. Matching Key Terms

Chapter 11

Research Using Available Data

I. Matching Key Terms

Chapter 16

Research Ethics

I. Matching Key Terms

return to top