This is an important feature of the methods currently employed by NAEP. When tests are created that are more costly to administer and score than conventional multiple-choice tests, the use of matrix sampling will be critical for keeping costs within bounds.
Another virtue to note is that current testing methodology makes possible comparisons over time. The collection of data on learning in- dicators is of limited value unless the measurement can be repeated, since the purpose of school evaluation is to detect change-to see if student performance is improving. Given that test-score scales are arbitrary, measures taken on a single occasion may be of limited value.
The only way in which such measures would be interpretable would be for the scores to have intrinsic meaning apart from com- parative interpretations. School evaluation is concerned not only with measuring change in the same individuals over a period of time but also with comparing the performance of successive groups of students at a particular stage of instruction, such as the end of the eighth grade. The latter kind of comparison is of particular interest at state and national levels. Unfortunately, it poses a difficult problem of interpretation because of possible changes in the composition of the groups that have nothing to do with instruction.
And there are many other problems of interpretation due to the use of fallible instruments, the possibility if not likelihood that a given test does not measure the same abilities before and after a period of training, the lack of random assignment of students, the lack of equal units on a score scale, the unreliability of difference scores, and so on.
The development of item-response theory I,ord, provides workable solutions to other problems. The extensive test theory that has been developed should be retained, but it needs to be adapted as necessary for use with new testing procedures. The results could reflect such attributes of performance as speed of respond- ing, use of inference in problem solving, pattern-recognition skills, students' internal models of problems, and use of strategies-and heuristics in solving problems.
Two major kinds of assessment procedures are considered. One consists of what might be called global measures, since the perfor- mance to be elicited will be evaluated as a whole. The other set of procedures yields processing measures, since they are descriptive of the information-processing components that influence the develop ment of conceptual knowledge and overt performance of the student. GIo bat Assessment A frequently used alternative to a multiple-choice test is an essay test in which the items elicit fairly long written responses. Such tests have the virtue that students not only must think of the ideas for themselves but also must organize them in an appropriate sequence and state them clearly.
Essay tests have been justifiably criticized, however, on the basis of the subjectivity and unreliability of scoring. Reliability can be improved by pooling the grades of two or more readers; in the case of essays written to test English-language proficiency, a holistic method of grading is used in large-scale testing in which two or more judges are asked to read each essay quickly and rate it impressionistically, and the ratings are pooled. The result is that grades are more reliable, but no one knows precisely what they mean. Another approach that has been tried involves the use of tasks that impose more structure on the response than does the typical essay question, so that one can know more precisely what skill is being measured N.
Frederiksen and Ward, ; Ward et al. In science, for example, the test problems might simulate tasks that are frequently encountered by scientists, such as formulating hy- potheses that might account for a set of research findings, making critical comments on a research proposal, or suggesting solutions to a methodological problem. For example, in one exercise students were.
Development of materials to aid in scoring this kind of test requires a protocol-analysis procedure that includes the following steps: Coders are trained to match each of a student's responses to a category, and scores can be generated by a computer on the basis of quality and other values attached to the categories. Tests of this sort were found to be poorer than Graduate Record Examination ORE scores for predicting first-year grades in gradu- ate school, but they were better than the GRE for predicting such student accomplishments as doing original research, designing and building laboratory equipment, and being author or coauthor of a re- search report.
Thus, there is at least correlational evidence that tests of the kind described above measure something related to productive thinking that is not measured by conventional tests. More sophisticated methods of analyzing free-response protocols are being developed, methods that do not require the imposition of such a high degree of structure. These methods are based on dis- course analysis C.
Frederiksen, , ; van Dijk and Kintsch, ; C. Flexible computer environments are being developed that permit students to generate text based on their retrieval, generation, and manipulation of declarative knowledge in a knowledge-rich do- main. The use of syntactic and semantic parsers makes it possible to analyze a student's responses to a task and to make their grammati- cal structure explicit on the screen.
Analysis of the structure is then possible in terms of the student's prior knowledge of the topic, the knowledge representations generated in performing the assigned task, and the operations performed in generating links to new information. One task, for example, required students to interpret the results of an experiment involving photosynthesis in terms of their knowI- edge of the chemistry of photosynthesis.
Their task involved a comprehending the experiment, b retrieving relevant information from memory, and c generating appropriate links between a and. Protocols from different students demonstrate differences in am preaches to the problem, such as forward and backward reasoning processes. Another approach to assessing performance is to display a student's structure as an overlay on a structure that represents a consensus among experts as to what constitutes an "ideal" answer.
To measure national and state progress toward the third National Education Goal and provide timely, fair, and accurate data about student achievement at the national level, among states, and in comparison to other nations;. NAEP's measures of student achievement should be reconceptualized to help policy makers, educators, and the public better understand strengths and weaknesses in student knowledge and skills. Measuring latencies in responding to relevant tasks would be an appropriate method for assessing a pattern-recognition skill. The simple goal is to design a contraption that knocks a star off various platforms. Pattern Recognition Pattern recognition is a skill related to speed of processing. Such a process would appear to require little creative thinking.
Subjects at different grade levels or different levels of compe- tency have been shown by such methods to differ with regard to patterns of performance in comprehending texts of different kinds C. Several states are experimenting with analogous methods for analyzing samples of student writing in state assessment programs, even without using computers to analyze in- dividual protocols. The procedures require human judgment and are not intrinsically dependent on the computer, but computerized assistance may make the method feasible for widespread use.
There are many other possible formats, including not only tests that require written responses but also tasks requiring hands-on operation of laboratory equipment. For example, students can be given the necessary materials and equipment and asked to design and carry out models of scientific investigations that demonstrate understanding of such scientific concepts as density, conductivity, and capillarity.
The availability of microprocessor-based computers in the class- room is growing at such a rate that it is not unreasonable to assume that In the near future every classroom, from kindergarten upward, will have access to computers. According to Becker ti, a na- tional survey conducted in found that between and the number of computers in use for school instruction quadrupled from , to over 1 million.
Furthermore, while costs are de- creasing, processing power, ability to produce graphics displays, and mass storage capabilities are at very high levels. While the committee's chief interest is. Ideally, such simulations should reflect hands-on science done inside or outside of the classroom. The com- puter can be used to provide simulated experiments that reinforce, review, and extend the hands-on studies. Simulations also make it possible to speed up or slow down the progress of time, enlarge or shrink distances, and modify or eliminate such factors as friction and gravitation.
If such simulations are integrated into appropriate host software systems, they can be powerful tools for assessment. The host soft- ware could remember the performance of each individual student on a mass storage device, such as a floppy disk; could provide the cIass- room teacher with appropriate summary information on the class as a whole; and could provide the option to examine in as much de- tai!
A simulation might be structured with regard to levels of achievement and could grant scoring points for good performance, just as good game soft- ware does. In this way, the simulations could give students valuable feedback as they use them, as well as storing information for the use of teachers and for the assessment of schools or school districts. Thus, the same information can be used for instructional or student evaluation purposes by the teacher, for local monitoring purposes by the principal or school superintendent, and as part of a state or national data base on student learning.
As possible instruments for national assessment, simulations would provide a solution to the problem of testing for real skills in doing science. They can be the kind of tests that should be taught to-which by their use will generate higher-quality science instruc- tion. It appears entirely practical to use simulations for classroom learning and to draw on a subset of the same group of simulations for local, state, and national assessment.
Since high-quaTity simulations are difficult and costly to create, it is important to maximize their use once they are in place. It is also more likely that better testing methods will be developed if at the same time they can be used to improve instruction. Cognitive scientists view students as infor- mation processors who possess a variety of capabilities that enable them to learn and function intelligently.
For example, in the sciences, the way and the extent to which scientific princi- ples are used to organize perception, problem solving, and reasoning distinguishes the novice from the expert. Procedural knowledge includes not only knowledge of algorithms but also the ability to plan and use various heuristics and strategies.
All these capacities function interactively in contributing to learning and intelligent behavior. An understand- ing of how they function should facilitate instruction N. Frederiksen, b , and an ability to assess these capabilities should be valuable not only to teachers and curriculum designers but also to educators at state and national levels. This information-processing conception of learning and intellec- tual performance is too complex to describe here. What follows are brief descriptions of a number of possible assessment procedures aimed at certain cognitive abilities, ordered roughly according to the complexity of the ability and the difficulties involved in assessing it.
The procedures suggested are generally based on experimental methods that have been devised by cognitive scientists for research purposes. Few of the procedures have been used for assessment, and much work will be needed before they can be used systematically in assessing proficiency in science and mathematics. For example, in learning to read, the beginner must learn how to translate letter combinations into speech sounds and to relate those sounds to words stored in memory.
These may be difficult tasks for a young child, but for a skilled reader they are performed very quickly and without attention. It has been shown that differences in response latencies in word analysis, dis- course analysis e. In the case of reading, ". The need for automatic processing in elementary arithmetic is well known to teachers although probably not by that term , and they try to increase automaticity by such means as drill with flash cards. Use of a computer would facilitate such training and would also make it possible to measure response latencies and, thus, identify those instances of finger counting or some other "short-cut" method that actually increases response time.
In algebra, automatic pro- cessing could be assessed by having the student carry out simple transformations of equations and measuring the response latencies. Moreover, patterns of latencies have been used to distinguish what kinds of procedures children use for addition and subtraction, for example, and how students and experts break algebraic equations into meaningful units. Thus, speed measures are useful not only for assessing automaticity but also for monitoring procedural skills. Pattern Recognition Pattern recognition is a skill related to speed of processing. With much practice one can learn to recognize very quickly a complex stimulus that may be embedded in a still more complex background.
This phenomenon was first observed by deGroot in comparing chess grand masters with ordinary chess players. He found that grand masters were able to reproduce cor- rectly the positions on a board of 20 to 25 chess pieces in a midgame position after seeing them for a few seconds, while ordinary players. Apparently grand masters had learned after years of staring at chess boards to quickly perceive and use patterns in processing data. Simon and Chase and Simon later timed the placement of the pieces and found that the intervals between placements were relatively short for the pieces in a cluster and that longer intervals defined the bound- aries between clusters.
Pattern recognition is important in many activities, and measures of this skill might be an indicator of proficiency because, like automaticity, such skill reduces the load on working memory and makes its resources available for other, more complex activities. Measuring latencies in responding to relevant tasks would be an appropriate method for assessing a pattern-recognition skill. Organization of Knowledge How knowledge is organized in long- term memory may be another useful indicator of an aspect of infor- mation processing.
The elements in long-term memory are items of information and clusters of such items, which are interrelated in complex ways to form an extremely large system. The organization may involve temporal, spatial, hierarchical, causal, and other kinds of relationships. Highly organized cognitive structures are formed as one acquires expertise in an area such as mechanics or forestry.
Since accessibility of stored informa- tion depends on how it is organized, it would undoubtedly be useful to know how information is organized in the minds of students and how that organization changes with practice. One cannot hope to discover how all the information in memory is organized, but methods are available for assessing the structure of knowledge in particular domains. One method is to ask students to recall items of information and to time the responses a method analogous to that used to investigate the size and nature of clusters of chess pieces as perceived by grand masters.
Sets of closely related items tend to occur with short latencies, while longer intervals tend. Another method is merely to have students sort the elements into clusters. A more sophisticated method makes use of judgments of similar- ity between pairs of words that represent the key concepts in a domain e. A student's ratings of all the possible pairs is an- alyzed by multivariate scaring, which produces a multidimensional representation of a structure.
This structure then can be compared with that obtained from the judgments of experts Shavelson, , ; Meyer and SchvanevelUt, ; Preece, ; Diekhoff, ; SchvanevelUt et al. Thus, it seems feasible to develop for a variety of subject-matter areas assessment methods that provide some information about the organization of information in memory for individuals or for groups of students. Skill in Retrieving Information The accessibility of information stored in memory has for many years been assessed by means of apti- tude tests presumed to measure the fluency with which associations and ideas are generated.
The ability is very general and is thought to be related to creativity. It is possible that analogous tests would be useful in certain specific domains of expertise to elicit responses related to particular topics in that domain. Students of botany, for example, might be asked such questions as "What might be the cause of the fruit dropping from an apple tree before the apples are ripe? Internal Representations of Problems How students conceive of a problem has much to do with their success in solving it. A given student's representation or mental mode!
If a crucial element is omitted or if the representation is inaccurate, solving the problem will be difficult or impossible. It would be useful to know what problem representations are used by students when they attempt to solve a certain type of problem. Once a protocol is obtained, it may be inter- preted in terms of the cognitive processes that are involved. Methods using protocol analysis would be useful in investigating how a problem is represented internally and how that representation changes with training and practice.
Another method of studying problem representations involves asking experts and novices to sort a set of problems into categories. The results in physics, where the method has been applied, indicate that novices tend to sort the problems on the basis of superficial characteristics of the problems, such as the use of inclined planes or pulleys, while the experts categorized the problems in terms of the physical principles that were involved Chi et al. Asking students to sort problems is a possible way of discovering something important about the internal representations of problems that they use.
Research on the misconceptions that many students have regard- ing physical phenomena shows the importance of discovering student conceptions of problems Stevens et al. And it is reported that an appreciable number of students, even those who have had a course in physics, believe that when an object is released from the run of a spinning wheel it will follow a spiral trajectory in space. Such misconceptions have been shown to be so enduring that some students reinterpret statements of physical laws to make them consistent with the misconception. Misconceptions about physical phenomena often can be discovered by asking a student to draw or otherwise indicate what he or she thought was happening or would happen under certain conditions.
Computers have been used to assess students' understanding of physical laws. One simulation depicts a Newtonian world without friction or gravitation in which objects obey the laws of motion. Such a simulation could be used both for assessment and for instruction. Procedural Knowledge The term procedural knowledge includes not only knowledge of such routine procedures as the algorithms used in computation but also more complex skills.
Complex skills may involve, for example, planning the steps to be taken in solving a problem and the use of strategies or such heuristics as means- end analysis, reformulating a problem, or thinking of analogies to a problem situation. One well-known bug, for example, involves subtracting the smaller number from the larger regardless of which one is on top.
Many other bugs have been found to exist that are unknown to most teachers. New computer programs provide detailed information about the sequence of steps the solution path that was taken by a student, and, from that information, the strategic errors committed because of inadequate mathematical understanding may be inferred. Other programs are intended to discover and assess the depth of a student's understanding of an area of expertise. For example, computerized algebra tools now being developed permit students to see and manipulate the array of possible steps that they could take as they attempt to solve an algebra problem.
Knowing the path students take through this "search tree" reveals much more about their skills in algebra than does the number of correct answers to the problems, including such metacognitive skills as choosing an appro- priate strategy, profiting from errors, and the ability to monitor one's own performance.
Computerized coaching systems are being developed that mon- itor a student's problem-solving performance. Based on diagnostic models that are integral parts of the system, computer programs can be designed that offer advice to the student and at the same time provide detailed assessments of his or her capabilities e. Such systems are now capable of assessing performance in very complex domains. Another feature of the computer is that it can keep track of the collection of strategies that a student tries in solving a problem and then generate a summary of what he or she has tried and has neglected to try.
Thus, the computer opens up several new possibili- ties for assessment. The interactive nature of the student-computer relationship allows the student's capabilities to be progressively dis- closed; if the student is unable to deal with a problem, more infor- mation or hints can be given Reiser et al. In this manner, a single problem can be used for both assessment and instructional purposes. Not all the computerized assessment procedures described above can be administered with a microcomputer; some may require the use of a sophisticated work station. The costs of such work stations have been decreasing at a rapid pace and are likely to continue to do so.
Within five years, such equipment will not be out of reach, at least for assessments on a four-year cycle. In the meantime, much can be done with small computers. As the cost of computers continues to decline, more assessments will become affordable. A note of caution is in order. Too much reliance on computerized testing and teaching may result in a tendency to substitute computer simulation for real-worId experience, or to tilt testing methodology toward those exercises that are most easily computerized.
Users and creators must be alert to minimize such tendencies, and innovative assessment devices that do not require a computer should also be developed and made available. The Development arid Use of New Methods None of the assessment methods described in this section can compete with multiple-choice tests from the standpoint of economy and efficiency, although matrix sampling makes their use more fea- sible. However, investment in the development of the recommended new methods and the cost of using them is, in the committee's view, justifiable not only because these methods would provide informa- tion for a far more accurate and complete assessment of instruction and student learning, but also because they are likely to be useful in the instructional process itself see, e.
These illustrations come from the popular press, but they exemplify uses made of NAEP data in the varied publications we examined. Although the source data for some of the statements in Table are unclear, what is clear is that the data were used to support descriptive, evaluative, and interpretive statements about student achievement in mathematics and science.
The first column of the table identifies the sources of the reports. The second column includes statements that describe how well American students, subgroups of students, or states performed on NAEP. In describing the results, column 2 shows that users often drew comparisons—to the past, across states, across population groups—to bring more meaning to the data. The descriptive statements in these and the other reports we reviewed were generally consistent with NAEP's design. The third column gives examples of evaluative.
California eighth-graders performed somewhat better but still ranged behind students in 32 states in the NAEP. California fourth-graders tied with Louisiana for worst state but did outperform Guam.
Page 1. Page 2. Page 3. Page 4. Page 5. Page 6. Page 7. Page 8. Page 9. Page Page Page Page Page Page Page Page Page National Academy of Sciences study of math and science indicators (Rai- icy- making has raised concerns about data quality and use, which leads to even.
The bad news is we stopped it in the basement. More than 40 percent of high school seniors across the country who took the science exam, and more than one-third of fourth-and eight-graders, could not meet the minimum academic expectations set. Too many schools, they contend, still emphasize rote memorization of facts instead of creative exercises that would arouse more curiosity in science and make the subject more relevant to students. These accounts speak to the adequacy of students' performance in We discuss the validity and utility of the achievement levels in Chapter 5.
The final column of Table shows users' attempts to provide an interpretive context for NAEP data. These statements illustrate the need for clearer explication of the data and for possible explanations of the results. The statements in column 4 generally reach beyond the data and the design used to generate them to identify sources of good or poor performance.
In our view, the excerpts in Table demonstrate how users hope and try to use NAEP results to inform thinking about the performance of the education system, schools, and student groups. As was observed for earlier administrations, some NAEP users accorded more meaning to the data than was warranted in laying out reasons for strong and weak performance.
Others sought to better understand strengths and weaknesses in students' knowledge and skills. To be sure, it is difficult to gauge and document the impact of statistical data on political and public discussion of education issues. We believe they can play an important role in stimulating and informing debate. Boruch and Boe argue that estimates of reliance on social science data in political discussion and decision making are biased downward.
They say that normal filtering systems contribute to underestimates of the value and impact of statistical data in the policy arena. National Longitudinal Studies and High School and Beyond data have been used in academic reports by manpower experts. The results of the Congressional Budget Office reports, in turn, are filtered and given serious attention that leads to decisions and perhaps recommendations by the National Academy of Sciences Committee on Youth Employment Programs National Research Council, These recommendations may then lead to changes in law, agency regulations, or policy.
During the May meeting of the National Assessment Governing Board, board members discussed the inadequacy of some of the current data presentations. At the meeting, members noted that current presentations of NAEP results by grade, demographic group, state, and a small number of additional variables do not point policy makers and educators to possible sources of disappointing or promising performance or to their possible policy implications.
To illustrate their concern, board members brought up the oft-cited finding that fourth graders who received more hours of direct reading instruction per week did less well on the NAEP reading assessment than students who. An analogous relationship was reported for fourth and eighth graders on the U. It probably is not the case that reading instruction depresses reading performance or that technology use depresses history knowledge. NAEP test takers with more hours of reading instruction may have received extra remedial instructional services, and students with more computers in the classroom may have attended schools in economically depressed areas where funding for technology may be easier to secure and where, on average, students score less well on standardized tests.
Board members pointed out that current data presentations may prompt faulty interpretations of results, in that the associations suggested by the paired-variable tables e. Discussion of the reading and history results, for example, may have been informed by data on types of instructional services provided or the uses made of computers in the classroom. Although, on their own, survey data of the type NAEP collects cannot be used to test hypotheses and offer definitive statements about the relationships among teaching, learning, and achievement, they can fuel intelligent discussion of possible relationships, particularly in combination with corroborating evidence from other datasets.
Most important, they can suggest hypotheses to be tested by research models that help reveal cause and effect relationships. To reiterate, our analysis of press reports, NAEP publications, and other published and unpublished documents suggests that NAEP's constituents want the program to:. NAEP has served and continues to serve as an important and useful monitor of American students' academic performance and progress.
NAEP is a useful barometer of student achievement. Serve an evaluative function. The establishment of performance levels for NAEP potentially allows policy makers and others to judge whether results are satisfactory or cause for alarm. They are meant to support inferences about the relationships among observed performance and externally defined performance goals. Provide interpretive information to help them better understand student achievement results and begin to investigate their policy implications. Policy makers and educators need an interpretive context for NAEP to support in-depth understanding of student achievement and to intelligently investigate the policy implications of NAEP results—particularly if performance is disappointing.
In fact, as shown in Table and elsewhere, in the absence of contextual data,. Examination of the current NAEP program indicates that the program does a good job of meeting the descriptive needs of its users; NAEP performs the "barometer" function well. Currently, however, the evaluative and interpretive purposes are not well achieved by NAEP. From their introduction, however, NAEP's standard-setting methods and results were roundly criticized Stufflebeam et al.
For a variety of reasons, evaluators have characterized NAEP standards as seriously flawed. Despite this, we note that the popularity of performance levels— and the evaluative judgments they support—is undeniable. Many policy makers and educators remain hopeful that NAEP standards will provide a useful external referent for observed student performance and signal the need to celebrate or revamp educational efforts.
As others have, we encourage NAGB to continue improving their recently assumed evaluative activities, so that NAEP can make reasonable and useful statements about the adequacy of U. In Chapter 5 , we discuss the evaluative function in detail and make recommendations for improving the way that NAEP performance standards are set.
Also not well met are the interpretive functions users ascribe to NAEP. Interpretive information about strengths and weaknesses in the knowledge and skills tested by NAEP can be obtained from more in-depth analyses of student responses within and across NAEP items and tasks than presently occurs.
There appears to be considerable room for improvement in NAEP in supporting this level of interpretive activity. In Chapter 4 we discuss ways that framework and assessment development and reporting can evolve to provide interpretive information that supports better understanding of student achievement. Interpretive information about the system-, school-, and student-level factors that relate to student achievement can be provided by including NAEP in a broader, well-integrated system of education data collections.
Within the context of NCES' data collections, there appears to be considerable need for improvement in data coordination to support this level of interpretive activity. We devote the remainder of this chapter to a proposal for building and using a broader system of indicators to address many of the interpretive needs of NAEP's users. We argue for the availability of contextual data to help users better understand NAEP results and focus their thinking about potentially useful or informative next steps.
Historically, NAEP has attempted to fulfill this need for contextual information by collecting data using student, teacher, and school background questionnaires on factors thought to be related to student achievement. However, as we have already discussed, these data generally are presented in paired-variable tables. In recent years, the length of the background questionnaires gradually has been reduced, in part because they have failed to capture policy makers' and educators' attention.
We contend that the current NAEP student, teacher, and school background questionnaire results should not be the principal source of data to meet NAEP users' interpretive needs. We seek, therefore, to accomplish this second type of interpretive function without further burdening NAEP. To this end, we next develop a conceptual and structural basis for a coordinated system of indicators for assessing educational progress, housed within NCES and including NAEP and other currently discrete, large-scale data collections.
We argue for a system that 1 expands the conception of educational progress to include educational outcomes that go beyond academic achievement, 2 informs educational debate by raising awareness of the complexity of the educational system, and 3 provides a basis for hypothesis generation about the relationships among academic achievement and school, demographic, and family variables that can be tested by appropriate research models.
For ease of reference in this chapter and throughout the report, we call the proposed system CSEI: We are not recommending this nomenclature for operational use by NCES but adopt it here for clarity and to streamline the text. We foreshadow our discussion of the system by noting that much of the data we seek on student characteristics, teaching, learning, and assessment already reside at the U. The feasibility of the effort we propose relies on the department's ability to capitalize on potentially powerful synergies among current efforts in ways that enhance the usefulness of NAEP results and contribute to the knowledge base about American educational progress.
Several of the current data collections could serve as important sources of contextual information about student achievement and signify educational progress in their own right. Collectively, these surveys gather a wide range of data on students and schools, including demographic characteristics, enrollments, staffing levels, school revenues and expenditures, school organization and management, teacher preparation and qualifications, working conditions,.
Each year NCES compiles recent data on many of these variables from across several separate surveys and publishes them in a compendium, The Condition of Education e. This volume serves as a valued source of information on education indicators with approximately 60 indicators selected for inclusion in each volume.
Our recommendation for creating a coordinated system of indicators was instigated in part by imagining the enhanced value of these indicators if the data collections from which they were drawn were coordinated so that cross-connections between datasets could be realized. For example, if in CSEI, collection of data on public elementary and secondary expenditures currently collected in the Common Core of Data and high school course-taking patterns currently collected in High School Transcript Studies were coordinated with collections of data on student achievement such as NAEP , relationships among these and other variables could be explored and presented in future reports.
A more comprehensive view of the inputs, processes, and outputs of American education would be the result. Table shows many of the current NCES data collections and notes their elements. The table suggests important commonality among the datasets; these correspondences among data elements, units of observation, and populations of inference should facilitate CSEI's development. We return to the discussion of these specific data collections later in the chapter. It is difficult to conceive of a system of education indicators that does not assign a key role to measures of student achievement in informing the public about how well schools are fulfilling their role in a democratic society.
NAEP must serve as a key indicator in the coordinated system of education indicators that we are recommending. In fact, much of this report is devoted to commentary on aspects of NAEP that are important to its becoming integral to a larger system of indicators of progress in American education. Bryk and Hermanson The panel called for the development of a system of education indicators that "respect[s] the complexity of the educational process and the internal operations.
During the course of its work, the panel proposed such a system; it included academic achievement and other learner outcomes, the quality of educational opportunity, and support for learning variables. This report documents the panel's thinking about the development of an education indicator system and makes recommendations for improved federal collection of education data.
In the report, the panel provided a conceptual framework for a system that includes, but goes beyond student achievement data, identified relevant extant data sources, and cited gaps in currently available data and information. Their work provides important grounding for the efforts we propose here. In an essay called ''Historical and Political Considerations in Developing a National Indicator System," Shavelson explains that, in their typical conceptions, indicator systems chart the degree to which a system is meeting its goals; they are generally structured to be policy relevant and problem oriented.
Shavelson observes that indicator systems historically have been heralded as a cure for many ills. Social indicator systems have been variously proposed as vehicles for setting goals and priorities Council of Chief State School Officers, , for evaluating educational initiatives Porter, , for developing balance sheets to gauge the cost-effectiveness of educational programs Rivlin, , for managing and holding schools accountable Richards, , and for suggesting policy levers that decision makers "can pull in order to improve student performance" Odden, Shavelson goes on to describe how enthusiasm for indicator systems has waxed and waned over time.
Linn and Baker recently wrote about renewed attention to educational indicators and described how interest in their uses is rising. They explain that current proposals describe indicator systems as vehicles for communicating to parents, students, teachers, policy makers, and the public about the course of educational progress in hopes that the educational community can "work together to improve the impact of educational services for our students" p.
Shavelson and others remind us to be cautious, however. Shavelson asserts that social indicator systems are properly used to 1 provide a broad picture of the health of a system, 2 improve public policy making by giving social problems visibility and by making informed judgments possible, and 3 provide insight into changes in outcomes over time and possibly suggesting policy options. Sheldon and Parke argue that social indicators can best be used to improve "our ability to state problems in a productive fashion, obtain clues as to promising lines of endeavor, and ask good questions.
We seek a system that suggests relationships among student, school, and achievement variables and that stimulates democratic discussion and debate about American education. We believe that NAEP is currently too "decoupled from important research and policy issues" Bohrnstedt, It is our hope that the system's products would be used to pose hypotheses about student achievement and test them, moving beyond observational to experimental research methods and using longitudinal designs. We began this chapter by discussing interpretations of NAEP results by policy makers and others that exceed NAEP's data and the design used to generate them.
We recognize that the system and products we propose here are likely to meet with similar treatment. We predict that some policy makers will use associative data from CSEI to tout their own initiatives, to argue for new educational practice, and to develop education policy. Our CSEI proposal is not intended as an argument for weak social science research or unwarranted inference.
However, we start from the position that education policy based on imperfect empirical data is better than education policy with no empirical base. We believe that the benefits of documenting interrelationships among achievement and educational variables in ways that respect the complexity of the educational enterprise will outweigh its disadvantages.
It is beyond our purview to recommend a conceptual model for CSEI, but Figure shows a set of possible indicators that might be included within such. The indicators are motivated by previous and current research and draw on the work of the Third International Mathematics and Science Study Peak, , the Reform Up Close project Porter et al.
For example, in his ongoing examination of schooling, Porter is studying learning and its correlates by focusing on achievement measures, teacher background variables, student background variables, instructional practice indicators, and school climate variables. Porter and his colleagues are finding positive relationships between reform-relevant instructional practice and student achievement. Porter's earlier work with the Reform Up Close project Porter et al.
They described a system that includes learner outcomes, including academic achievement measurable by traditional and alternative measures, attitudes, and dispositions; the quality of educational opportunity, including learning opportunities, teacher preparedness, school organization and governance, and other school resources; and support for learning variables, including family support, community support, and financial investments.
The volume title serves as a mantra for NCES's long-range planning. Conference organizers sought to p. Participants provided suggestions for tracking educational reform to the year ; measuring opportunity to learn, teacher education, and staff development; enhancing survey and experimental designs to include video and other qualitative designs; and effecting linkages to administrative records for research. Currently, within NCES itself as part of the Schools and Staffing Survey Program, researchers propose to track what is happening in the nation's schools around issues of school reform by collecting information on teacher capacity, school capacity, and system supports National Center for Education Statistics, They will examine teacher capacity by documenting teacher quality, teacher career paths, teacher professional development, and teacher instructional practices.
They will address school capacity by examining school organization and management, curriculum, and instruction—to include data on course offerings, instructional support, instructional organization and practices, school resources, parental involvement, and school safety and discipline. At this writing, NCES staff are considering the inclusion of student achievement data in the system, thus creating an initial version of a coordinated system of indicators, one function of which would be to better understand factors that influence patterns of student achievement.
With these efforts as a guide and to illustrate, but not prescribe, a conceptual model for CSEI, we refer to Figure , which shows possible elements of the system and shows the role of student achievement measures in CSEI. Figure suggests the types and range of indicators that might be included in a coordinated system. Two studies illustrate the value of embedding measures of student achievement within a broader range of educational measures: TIMMS is one example of a data system that provides information on student achievement and educational variables Peak, It was designed to describe student performance in mathematics and science and to promote understanding of the educational context in which learning and achievement take place.
The TIMSS dataset for grade 8 includes a wide variety of data about student achievement, curriculum and instruction, education policy, and teachers' and students' lives.
The grade 8 study examined multiple levels of the education system using mixed methods of data collection. TIMSS researchers collected and analyzed data from student tests, student and teacher questionnaires, curriculum and textbook analyses, videotapes of classroom instruction, and case studies on policy topics. TIMSS researchers posed a series of questions and then designed data collections to obtain the information needed to help answer those questions. TIMSS researchers designed the study to learn:. At the eighth-grade level, the U. Furthermore, there appeared to be little improvement in U.
Classroom-, school-, and system-level data collections enabled TIMSS researchers to suggest a number of factors potentially associated with American students' lackluster performance that bear further investigation. The content of U. Topic coverage was found to be less focused in U.
The TIMSS research examined education policy and practice broadly and used this information to describe American education and students' achievement and to frame hypotheses about strong and weak academic performance. This work has important implications for NAEP and CSEI, since it illustrates how different and complementary research methods and data can be brought to bear on important education policy questions.
Recent heated debate about the meaning of the disappointing performance of high school students on TIMSS stands in contrast to earlier discussion of the eighth-grade data. Many explanations have been offered for the poor showing of American twelfth graders: At the ground level, the space is anchored by an immense vertical installation that allows children to feel a tornado spinning.
The perimeter of the installation is lined with smaller lab spaces for group learning. It is ironic that we create such amazing interactive science exhibitions that we bring our children to on special occasions rather than just building them at the schools. For designers invested in educational spaces, the challenge is obvious: We cannot simply hammer the round peg of this STEM initiative into the square hole that is the 19th century school model. Educators, scientists, architects, engineers, artists, technologists, designers, and kids can collaborate to re-envision the pedagogy and the learning environment needed to support STEM.
Through this design journey we will rediscover the spirit of playfulness and fun in learning science and meet the challenges of the Race to the Top. Trung Le is a principal education designer at Cannon Design. Over the past two years he has helped lead an interdisciplinary group of designers and educators from the U. By Trung Le 5 minute Read. Ideas Ideas Facebook is learning how to boost online giving Ideas These maps show the low-income communities that Florence will hit hardest Ideas Minneapolis would like to cure your dockless bike-share skepticism.
Design Why you should redesign your portfolio every year Co.