Letter from College Board's Vice President of Research
September 7, 2006
Because of a recent mischaracterization of the College Board's communications around the College Bound Senior SAT data for 2006, we feel compelled to restate information previously provided at our press conference on August 29 and in interviews with numerous reporters from around the country. Misuse and distortion of information about SAT scores serves no purpose but to create anxiety among students and families and therefore must be corrected. The College Board's College Bound Seniors calculations are correct, and our explanations of year-over-year changes have been thorough, consistent, and valid.
With regard to this year's SAT scores, we said, "when a new test is introduced, students usually vary their test-taking behavior in a variety of ways and this affects scores" and we went on to point out that the "most significant factor in the overall change in this year's scores is mainly attributable to a change in student test-taking patterns."
We discussed a broad array of behaviors, including students not taking the test a second time. In combination, we explained, these factors contributed to a mean score decline of 7 points across both Critical Reading and Mathematics sections of the SAT: approximately half a question on the Critical Reading and approximately a fifth of a question on the Math.
One of the changes in testing patterns that affected the score decline was the fact that more than six percent of students took the old SAT in January of their junior year, or earlier, and never took the new SAT. The average scores for these students were significantly lower than those of juniors who tested early in past years, despite the fact that they took the old test.
In addition, many more students than usual chose not to take the March and May 2005 administrations, and tested for the first time much later than usual. Again, this type of student behavior reduces the opportunity to retest and limits the opportunity to demonstrate educational growth in subsequent retesting.
Although we discussed this factor during the press conference question-and-answer period and in subsequent interviews with the media, we did not describe this factor as "notable," because it reflects a one-time phenomenon related to student behavior in anticipation of the introduction of a new test. Because students will not have the same "old versus new" choice going forward, it is an interesting isolated occurrence, but not especially instructive when we look at the behaviors of college bound seniors year over year.
What we did find notable in our study of student scores and behaviors, which is described in much greater detail below, was the decrease in the number of students who took the test twice or more as well as the characteristics of those students themselves. Taken together, these factors may reflect a trend in testing patterns generally and combined to affect the score decrease that we saw this year.
There is no one "magic bullet" that will explain complex patterns and small changes in test scores or other educational indicators, but these changes in student behavior, combined with changes in when students took the test, are highly related to the 7-point decline. As you will see below, these matters become highly technical and lend themselves to meticulous study and lengthy description. One cannot calculate the possible impact of retesting with simple mathematical computations.
A More In-Depth View
First, it's important to view the score change we have observed this year in a larger context. The mean SAT Critical Reading and Math scores for College Bound Seniors in 2005 were 508 and 520, respectively. The mean SAT Critical Reading score decreased 5 points in 2006 to 503; the mean SAT Math score decreased 2 points to 518. A change of 7 points in SAT means over one year is unusual, but not unprecedented. Going back to 1973, a change of 7 points or greater in mean scores has occurred in 5 of those years, or approximately 15 percent of the time. We have had a change of 5 points or greater in the mean Critical Reading score in 5 of those years as well, and a change of 2 points or greater in mean Math scores in 14 of the years and 9 of the last 15 years.
In 1995, the year following the last major revision to the SAT (in 1994, we eliminated antonyms, introduced more lengthy reading passages, permitted calculators, and added constructed response math items), there was a similar 7-point change, but it was an increase. In 2003 there was a 6-point increase in mean SAT scores. The score changes in 1995 and 2003 are similar to the score change in 2006 and constitute a change of less than 1 percent in the score scale1.
Another way to contextualize this is to compare it to the change in high school GPA we observed in 2006. The mean high school GPA for this year's College Bound Seniors increased from 3.30 to 3.33, a change of .0069 (or .69 percent), which is actually larger than the change in SAT scores. The significant finding is that neither of these changes is greater than 1 percent and neither is truly significant. Forty-three percent of this year's College Bound students had an A average in high school, which further illustrates the importance of additional information admissions tests contribute to decisions about college readiness and success. Our 2003 research report on grade inflation demonstrated high school grades have consistently increased across all subjects in the past two decades while scores on national tests like the SAT have generally remained flat.
Impact of Retesting Behavior on SAT Scores
- The average student who takes the SAT a second time increases his or her scores by approximately 30 points; however, that is the average student.
- Score changes are also impacted by a well-established statistical finding called "regression to the mean." The lower a student's first score, the larger the average score increase will be upon retesting. The opposite is also true—students with high initial scores, on average, have much smaller increases with retesting. We publish each year a detailed table that illustrates the typical score change for students based on their initial score. A copy of the table can be found at http://www.collegeboard.com/prod_downloads/about/news_info/cbsenior/yr2005/09_effects_of_repeating_sat_0506.pdf
- Among students who scored in the lower half of the SAT scale, we saw a significant decrease in the percent of students taking the test more than once. Students who scored at the higher end of the SAT scale do not appear to have changed their SAT retesting behavior. That is, the students who chose not to retest were among those who had initial scores in the bottom half of the national population, and who, on average, would have increased their score more than 30 points.
- The decrease in retesting was not just among students who would have taken the test twice, but was actually greatest among students who would have taken the test three times. We saw:
- a 3 percent increase in students testing once;
- a 1 percent decrease in students testing twice;
- and a 2 percent decrease in students testing 3 times.
On average, students who take the test three times see a difference of 53 points between first and third tests2. However, as illustrated above, using 53 as the average would be just as misleading as using 30 points to calculate the differential in testing twice, because the decline in retesting came from students with below-average scores.
In computing mean scores, we use the last test scores for each student, whether a student tests once, twice, three times, or more. As explained above, the last score is normally the highest score for students testing more than once. A good illustration of the impact of retesting is to compare the score change for students using the first SAT score instead of the last. Because all students in College Bound Seniors take the SAT once, using the first score eliminates a score increase attributable to retesting. When we examine changes in first scores from 2005 to 2006 mean score decline is reduced from 7 to 2 points (see table below).
Change in Mean SAT Scores from 2005-06 Based on Last or First Score
| CB Seniors Report (Last Score) | First SAT Score | |
|---|---|---|
| Critical Reading | -5 | -3 |
| Math | -2 | +1 |
| Total | -7 | -2 |
Fatigue Study
There has been speculation that the 45-minute increase in testing time has reduced student performance. We conducted a study with approximately 700,000 students who completed the new SAT in March or October of 2005, and compared results with an additional 437,000 students who took the old SAT. We examined the number and percent of items answered correctly, incorrectly, or omitted for each of these tests. Specifically, we investigated whether items in the last sections of the test were completed less frequently or answered correctly less frequently than items in the first and middle sections of the test. We found no differences in the rate of omitted or incorrect items toward the end of the testing time than toward the beginning or middle of the test. The study is under external peer review and is expected to be published in 30 to 60 days.
There has been similar speculation that the changes to Critical Reading and Math tests have caused changes to the construct. Researchers at the College Board and ETS have conducted a number of different analyses, including factor analytic studies comparing the old and new tests, which demonstrate there are no significant changes in the construct. One study is in press and the second study is in preparation for publication3.
One significant finding about the score change has not received the level of attention it may deserve. We have compared the SAT scores for students who have completed a core curriculum or more4 with students who completed less than a core curriculum. The table below illustrates that there was a very modest score decline (-2) for 77 percent of students who satisfied a core curriculum in high school, but a much larger decline (-11) among the 23 percent of students who complete a less rigorous curriculum. Most troubling, in 2006 there was a 97-point gap between students who have taken a rigorous high school curriculum and those who have not. Please see table below:
| 2005 N (%) |
CR | M | 2006 N (%) |
CR | M | 06-05 | CR (06-05) |
M (06-05) |
|
|---|---|---|---|---|---|---|---|---|---|
| Core + | 909,049 (77.3) |
522 | 530 | 903,452 (77.0) |
519 | 531 | -5,597 (-0.3) |
-3 | +1 |
| Core - | 267,278 (22.7) |
476 | 488 | 270,728 (23.0) |
470 | 483 | +3,150 (+0.3) |
-6 | -5 |
There has been a significant amount of research conducted externally, as well as by the College Board, which consistently demonstrates that students who take a rigorous and challenging curriculum in high school not only perform significantly better on the SAT but also are more likely to succeed in college. Of course, this finding, as well as many other findings, are correlational and not causal in nature.
In short, a variety of potential hypotheses may partially contribute to understanding aggregate changes in test scores. Changes in the population of students (e.g., their academic preparation, demographic background), test-taking behavior (e.g., when they first and last take the SAT, how often they retest), and changes to the test are all typically associated with aggregate score changes.
As described above, the College Board has conducted a substantial amount of research that shows no evidence that changes to the test contribute to the small score differential between this year's and last year's mean scores. However, we have detected changes in student behavior that are related to the change in scores. There are other plausible hypotheses that could partially explain score changes as well, but without data this is informed speculation at best. Changes in student performance on educational measures result from a complex interaction among a number of educationally relevant factors. The College Board issues a full-page caution on the use of aggregate scores in our annual press materials on SAT scores.
As always, we urge all those concerned about education to continue to focus attention and research on examining the gap in the quality and rigor of instruction and curriculum throughout the nation's schools. These two factors are related to performance on tests like the SAT, ACT, NAEP and others, as well as to the academic success, readiness, retention, and graduation rates of students who attend college.
Wayne J. Camara
Vice President of Research and Analysis
1We do not typically characterize changes in mean SAT scores in percentage terms, but it is a very easy and transparent way to contextualize the size of a 7-point change. The computations are as follows: (1) The Critical Reading and Math scores are on a scale of 200 to 800 points, (2) There are 601 possible mean score points (800-199 = 601) for each, (3) Therefore, across both the Critical Reading and Math tests there would be 1202 possible points (601 x 2), and (4) A 7-point change on a scale that ranges from 400 to 1600 (or 1201 possible points) would be computed as a percentile as follows: 7/1202 = .0058236 or .58 percent.
2See http://www.collegeboard.com/prod_downloads/about/news_info/cbsenior/yr2002/sixB.pdf
3A variety of analyses were performed to examine this issue. First, prototypes of the new SAT were included in the March 2003 field trial, in which over 49,000 students from across the country participated. In the field trial, some students took both the new and the old SAT, and structural equation modeling techniques were used to assess the degree to which the new test had the same underlying structure as the old. The result indicated that the tests do indeed have the same underlying structure, lending support to the contention that the old and new SAT measure essentially the same constructs. Factor analyses conducted on the old and new SAT demonstrate the same factorial structure.
SAT forms are always linked back to previous forms to ensure that scores are comparable from administration to administration. This linking is accomplished by the use of items in the variable section on the test; that is, the section that students take but that does not count toward their score. For some students, items in the variable section represent items that have been administered many times in past forms, and data from these items are used in equating. These links are referred to as "external anchors" because performance on the items does not count toward the total score (it's external to the total score). For the first few forms of the SAT, internal anchors were used as well. That is, items that appeared in the scored sections of the new SAT had also appeared many times in past SAT forms, which allowed establishing very rigorous equating linkages that ensured comparability of scores from the old to new SAT.
Finally, researchers at ETS examined all of the equatings that have been conducted on the new SAT to examine how invariant the linkages have been for males and females. Results of the score equity analyses across the first 10 forms of the new SAT indicated that the linkages were indeed invariant by gender subgroup -- the overall conversions were the same as if we had developed them separated by gender subgroup. If the construct had shifted and scores were no longer comparable, we would have found different results in these statistical analyses. Finally, research on the construct, the impact of fatigue on SAT performance, and other analyses that are related to the score decline this year have been reviewed by external researchers and psychometricians.
4A core curriculum in high school is defined as four years or more of English and three years or more of Math (including Algebra), Science, and Social Sciences/History.