National Council of Teachers of Mathematics 2012 Research Presession

Please note: The NCTM conference program is subject to change.

1230-

Tuesday, April 24, 2012: 11:00 AM
Franklin Hall 13 (Philadelphia Marriott Downtown)
Charles Munter , University of Pittsburgh, Pittsburgh, PA
Paul Cobb , Vanderbilt University, Nashville, TN
One purpose of education research is to develop and rigorously evaluate the effectiveness of programs for supporting students’ learning and achievement. The Institute of Education Sciences recently amplified that purpose (Shadish & Cook, 2009) and attempted to improve the methodological standards for conducting such work—primarily through the What Works Clearinghouse (WWC), which, since 2002, has supported an ongoing effort to synthesize research on the effectiveness of educational interventions, programs, and policies. According to its stringent, methodological standards, only “well-designed and well-implemented” randomized controlled trials and studies employing quasi-experimental designs with equating or matching are included in the WWC’s 11 topical syntheses, three of which are devoted to mathematics programs: elementary, middle, and high school.

Providing practitioners and policy-makers access to data from rigorous, scientific research is undoubtedly an important endeavor, and education evaluation research has benefited from efforts to define and increase methodological standards. However, it is not clear whether shared expectations have been established for conceptualizing evaluation studies in domain-specific ways. For example, how might an evaluation of reform-oriented textbooks differ from that of a more traditional series?

In this paper we offer a critique of both recent mathematics program evaluations and of the WWC’s criteria and means of reporting its findings. Specifically, focusing on evaluation reports of elementary, middle, and high school mathematics programs that met WWC inclusion criteria, we assessed the extent to which evaluators identified and incorporated into their evaluation designs aspects of the theories of learning and teaching mathematics that underlie the programs of interest.

Theoretical Framework

Our “program theory” approach is premised on the notion that the validity of assessments of effectiveness and attributions of causality within the experimental research paradigm is boosted by investigations of the mechanisms by which programs achieve (or do not achieve) an intended set of outcomes (Bickman, 1987; Cook & Shadish, 1986; Lipsey, 1985).

We draw on two sources to develop our framework for analyzing evaluation reports. The first are arguments concerning principles of theory-based evaluation research, which suggest that the theories that underpin programs should play a role in each step of program evaluation, from initial design to the interpretation of findings (Bickman, 1987; Coryn et al., 2010; Lipsey, 1993; Weiss, 1997). Second, we follow the work of the committee convened by the National Research Council to review the quality of existing evaluation studies of mathematics curriculum materials (Confrey & Stohl, 2004).  The committee complemented the broader principles of theory-based evaluation research with mathematics-specific criteria by combining perspectives from “method-oriented” evaluation (as emphasized by the WWC) and “theory-driven” evaluation (as described in this paper).

Data Sources

We intentionally limited our investigation to evaluations of K-12 mathematics programs determined by the WWC to have met their own standards for inclusion in ongoing syntheses of the effectiveness of educational programs. As of July 1, 2011, 35 studies satisfied these criteria: 13 studies of a total of 7 elementary programs, 16 studies of 7 middle school programs, and 6 studies of 3 high school programs. Assuming that all the studies had relatively high methodological rigor, we assessed whether the studies also satisfied the principles of experimental, theory-based evaluation research referenced above. For example, we assessed whether outcome measures were not merely valid and reliable, but were also aligned with the outcomes of interest.  In addition, we determined whether evaluators not only demonstrated that a program was “implemented as designed,” but also articulated and assessed mediating steps in a causal chain.

Methods

Broadly stated, we investigated the extent to which evaluators of mathematics programs attended to program theory. Specifically, of each evaluation report we asked:

  1. To what extent did the research questions concern the relationships (e.g., mediating, moderating) between components of the program theory?
  2. To what extent did measures employed in the evaluation assess outcome and process constructs articulated in program theory?
  3. To what extent do the answers to the above questions vary by the nature of the program (i.e., type of mathematics learning goals and instruction) and background of the evaluator(s) (e.g., developer vs. outside evaluator; academic vs. private)?

Each of the 35 evaluations reports was analyzed by at least one of the authors. A subset of 3 reports was analyzed collaboratively, and a further 7 (20%) were analyzed independently and the results compared in order to ensure sufficient reliability in our coding. We employed a coding scheme based on adaptations of the work of those referenced above (Confrey & Stohl, 2004; Coryn et al., 2010; Weiss, 1997). Each evaluation report was classified as theoretical, subtheoretical, or atheoretical (Lipsey et al., 1985) with respect to four dimensions: 1) type and quality of program theory articulated; and the extent to which program theory influenced the 2) research questions; 3) construct measurements (including implementation and assessment components specific to mathematics teaching and learning); and 4) analysis.

Results

Preliminary findings suggest that the majority of evaluations of mathematics programs have moved beyond “black box” studies in that many specified how a program must be implemented, which typically though not always guided the study. However, evaluators rarely articulated a testable causal chain for how a program was theorized to achieve its goals. Consequently, most of the evaluations did not investigate questions about mechanisms, and measures were often insensitive to the full array of intended outcomes of a program.  We therefore classified most studies as ‘subtheoretical’.

Significance

The findings of evaluation studies guide the decisions of policy makers at every level, including the adoption of both curriculum materials and intervention programs. These decisions are consequential for students’ mathematics learning and academic futures. It is therefore crucial that evaluators ‘get it right’ when assessing the effectiveness of such programs. Our analysis indicates that WWC’s methodological specifications are inadequate because they overlook understanding and using theory in evaluation design and implementation. In general, evaluation research of mathematics programs needs to improve in its use of program theory. Both method and theory are important in order to produce valid evidence on which policy and local curriculum adoption decisions can be based with confidence.

<< Previous Presentation | Next Presentation