GRE Going Back To Paper and Pencil

One of the lessons that you would think we would have learned by now is that just because something can be done by computer does not mean that it should. The GRE examiners seems to have finally learned this lesson. Three years ago, the Educational Testing Service (ETS) transitioned the Graduate Record Examination (GRE) to a computer-based system. My wife was one of the first to take the new computer-based exam, and I personally spent quite a few hours studying for it before determining that my first-choice graduate school did not require the GRE.

Both of us felt strongly that the computer-based test was badly done. In particular, something with that much significance for people’s future should have at least brought in a legitimate user-interface expert. It was clear that ETS had hired a small group of mediocre software developers to build the system. This might’ve been a success if they had done real research into interface design. There are companies out there that do this well (Apple being the one that comes immediately to mind) and it is because they are willing to spend money on it.

There were three big problems with the system, as my wife and I saw it:

First, the test touted an “adaptive question selection” system that was completely opaque. What this meant was that if you got some of the first questions wrong, it would ask you easier ones later. If you got them right, it would get harder. Conceptually, this is interesting and potentially useful, however it was not obvious at all how it worked and in such a system rigour is important. The mathematical backing for such an algorithm is weak. The goal of a standardized test is to create a metric. A spectrum of individuals is created in proportion to their score. While it is certainly not clear what the GRE measures, it is obvious that the spectrum is more useful if it is wider rather than narrower. It is doubtful that adaptive testing achieves this, or that it does so fairly.

Consider a hypothetical test of ten questions, weighted equally. If a student gets the majority of the first five correct, then the next five get harder. Otherwise, they get easier. Student A is better at whatever the GRE is attempting to measure than Student B. Student A gets 4/5 right in the first five, the questions get harder, and she only gets 2/5 in the remaining half. Student B, however, gets 2/5 in the first half, and so the questions get easier and she gets 4/5 in the second half. They both got 6/10 on the whole test, but Student A’s test was significantly harder! Does it make sense to give them the same score?

So, what if we weigh the questions according to their difficulty? This creates an artificial bifurcation in possible test outcomes. If you miss one question early, that might limit the overall total possible score you could achieve, even if you answered all the rest of them right.

The second problem is a side-effect of the first. Because of the adaptive question selection, you could not go back and check your work after you finished the test. This is a common test-taking strategy and it is completely nullified. The reason is simple. Taking the previous hypothetical example, let’s imagine that Student B (who got 2/5 on the first half) went on and did the rest of the test, then went back and fixed an error on one of the first five questions. Suddenly, she has 3/5 on the first half and the questions on the second half should be the harder ones! It is totally unclear what the adaptive system should do in this case, so ETS “solved” this by preventing you from fixing mistakes.

Also, this makes it almost impossible to manage your time effectively in the test. Typically, if you get to a time-consuming problem, you might skip it and do later problems, coming back later. In the computer test, since you can’t skip forward or backwards, you’re stuck on the current problem no matter what. If it turns out to be time-consuming but soluble, you might be better off answering it incorrectly simply so you have a chance to answer the rest of the test. Of course, that might make the rest of the test adapt and prevent you from getting a higher score. This is just bad design.

The final complaint is that the computer test is inherently unmarkable. Many, many people move along through a test and mark questions to later revisit, or cross out answers to focus on the most likely options. The computer test makes this impossible.

Interestingly, though, the ETS decided to get rid of the computer test, not because they didn’t hire any user-interface experts, but because they didn’t hire any security experts. It turns out that cheating has been rampant in Asia:

Prompted by a sudden rise in GRE verbal scores from China, the Educational Testing Service launched an investigation on behalf of the GRE board that uncovered Asian-language websites offering questions from live versions of the computer-based GRE general test.

The GRE board instructed the ETS to temporarily suspend the computer-based GRE general test in China, Taiwan and Korea until security can be guaranteed. ETS will reintroduce paper-based versions of the exam in these regions that will be administered just twice a year, in November and March.

“(The GRE board) found that the only secure way (to administer the GRE in these countries) is to return to pencil and paper,” said John Yopp, vice president for graduate and professional education for ETS.

This is a particular damning quote that serves to remind everyone that ETS is not affiliated with any government agency. It is a for-profit company.

“This is just the latest snafu in a string of problems since the ETS began the introduction of the computer-based GRE in the early 1990s,” said Bob Schaeffer, public education director of FairTest. “This is a technology that is not ready for prime time that has been forced on test takers because of corporate greed.”

Leave a Reply

You must be logged in to post a comment.