The Macrosite for News, Analysis and Opinion about the Future of the Internet
Ariella Brown

Automated Systems 'Grade' Written Work

Written by Ariella Brown
5/2/2012 41 comments
DISCUSS     Email This

Can a software program effectively replace human essay readers?

Answers to that question vary.

A competition sponsored by the William and Flora Hewlett Foundation recently put up $100,000 in three prizes to discover a program that performs as well as human scorers in evaluating written essays. The competition, which was posted on Kaggle, a global crowdsourcing and collaboration Website for predictive modeling experts, drew 258 players in 159 teams.

The three-person team of “SirGuessalot & PlanetThanet & Stefan” (aka Momchil Georgiev), Jason Tigge, and Sefan Henß) arrived at a system the judges found closest to human reader results. They won the foundation’s first-place award of $60,000. Second place winners will be awarded $30,000 and third place winners, $10,000.

Separately, data scientist Ben Hammer partnered with Mark Shermis, the dean of the University of Akron's College of Education, to research and write “Contrasting State-of-the-Art Automated Scoring of Essays: Analysis.” Their research, also funded by the Hewlett Foundation, leads them to conclude that “the automated essay scoring engines performed quite well.”

Both examples showcase the quest for so-called automated assessment systems to “grade” or judge written work. Some regard this effort as a sign of real progress. In the view of Steve Graham, a professor at Vanderbilt University, humans are not very good at objective assessment.

That is also what Leonard Mlodinow suggests in his book The Drunkard’s Walk: How Randomness Rules Our Lives (Pantheon Books, 2008). Mlodinow recounts his dismay at a 93, the score his son’s high school teacher put on the paper that he -- a published writer -- had rewritten. He attributes the missing points to teacher fallibility, contending that “a teacher’s assessment, like any measurement, is susceptible to random variance and error” (p. 126).

Other people are appalled at the prospect of a machine assessing human writing. Les Pearlman, the director of writing at MIT, falls into that camp. He finds the e-Rater automated scoring system from nonprofit group ETS seriously flawed because it “can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.” Pearlman shows that essays that include patently false statements can still earn perfect scores on e-Rater.

It is only fair to point out that it is possible to “game” a human monitor, as well. For instance, students have observed that the key to getting an A from a certain teacher is to include a PowerPoint presentation.

The problem of incorrect statements is not unique to e-Rater either. As a regular scorer for the SAT essay, I have long ago internalized that I am not supposed to hold statements like “Albert Einstein invented the lightbulb” against the student. The rationale behind that is that we are assessing the students’ ability to develop and support a point of view -- not how well they know the history, literature, or science they refer to. If that’s a flaw, it exists in tests scored by humans as well as by machines.

Imperfect though they may be, automated assessment systems are not only on the way, they are already here. As Hammer and Shermis’s analysis quoted above points out, automated systems currently take the place of a human as second reader “for high stakes assessment in several general tests (e.g., TOEFL, GMAT) and... for some licensing exams (e.g., AICPA).”

As a result of the Hewlett competition, it is possible that even more exams will be scored by automated assessment systems. Some may regard that as a blessing, but others as a curse. It certainly has kicked up quite a bit of debate. What do you think?

Related posts:

— Ariella Brown is a freelance writer, editor, and social media consultant.

DISCUSS     Email This
Current display:       newest comments first       display in chronological order
< Previous   Page 2 of 5   Next >
Ariella
Thinkernetter
Thursday May 3, 2012 5:07:52 PM
no ratings

@Kim Davis, as would James Joyce for breaking multiple writing rules in Ulysses.

Kim Davis
Thinkernetter
Thursday May 3, 2012 4:58:39 PM
no ratings

This reminds me that I have a very smart and tech-savvy friend who thinks having machines write successful pop tunes is just around the corner.  He thinks the algorithms are readily discoverable.  Does that sound plausible?

If so, I can believe that the machines could at least be successful in screening written texts for quality. Nicole is right, of course, that they are going to miss things which are deeply interesting, but not in a predictable or mainstream way. For better or worse, a Gertrude Stein would surely get an F.

Ariella
Thinkernetter
Thursday May 3, 2012 4:37:14 PM
no ratings

@trvorh You've had some interesting experiencees. Do you know the name of the program your daughter was using? 

trevorh
Rank: Scrivener
Thursday May 3, 2012 3:40:54 PM
no ratings

I remember having to completely reverse my thesis in an essay for an AP English class in order to get a good grade. I didn't really agree with what I had written, but my teacher did, so I got the 'A' I wanted.

 

Some of my kids are now being assigned to log in to a particular website to do their writing, and it grades them as they write. My daughter was trying to get a better grade, and enlisted my help as an editor. The more correct her writing became, the lower the score got. We finally just gave up.

Ariella
Thinkernetter
Thursday May 3, 2012 10:18:51 AM
no ratings

@Joe When I worked as writing center tutor in colleges, students would boast of how they identified what they considered key to getting a better grade. I recall one student showing off his story (this was a writing exercise as a fable with a moral based on Beat Not the Poor Desk) in which a lioness argues for the right to join the male lions on the hunt. He thought he was being very clever in appealing to the teacher's feminist streak. He had no clue, of course, that, in fact, the lionesses are the ones who hunt, and the male lions usually stay back. Now, if he was being graded based on knowledge, he would lose points on that.  

Joe Stanganelli
Thinkernetter
Thursday May 3, 2012 1:18:52 AM
no ratings

I had a law professor whose exam I attempted to "game" by writing answers I didn't really believe in but would conform with her political opinions.

Not sure how well that worked; I wound up riding the curve.

DukeW
IQ Crew
Wednesday May 2, 2012 6:35:30 PM
no ratings

Joe, I think we're in complete agreement.  All programs reflect the views and biases of their builders (hello, Tron).  Always have, always will.  But the amazing benefits of letting students build their writing skill with automated assistance are to be highly encouraged (there are never enough hours in the day for even the most talented teachers to give their students all the attention each requires).  I just hope the tools are used in the right mixture, as an aid rather than a crutch.  Oh, and just so you understand: I do believe my "exhibits" were just part of my education.  For every example like these, I can cite others, like the math TA who didn't laugh when I "re-discovered" Pascal's Triangle, or the Journalism prof who had me read my news copy from the college radio station in class so his students could see the difference between print and broadcast journalism.  I'll try to do a better job in the future of making my sense of irony more obvious.  And one final note: can't recall who had pointed out that students will learn to 'game' the device to boost their scores.  What, that hasn't already been done with professors?  Ariella's example of a prof giving a better grade because he supported the student's position is proof positive that this kind of 'gaming' has always been part of the process.  The more things change....

Joe Stanganelli
Thinkernetter
Wednesday May 2, 2012 5:10:54 PM
no ratings

Compelling point about human frailties and biases, Duke, but that would seem to support the opposite point if you think about it -- because the computers are only as constant and unbiased as the humans who build and program them.

Indeed, linguistic sentiment analysis, while a VERY promising field, still isn't quite there yet in terms of picking up subtleties of human language (esp. things like irony and sarcasm).  For this reason, a creative, unique, poignant, extremely well-written, and otherwise spot-on student paper could be given a sub-par grade because it does not neatly fit the rubric that the machine is looking for.

Similarly, it seems to me that students, once they catch wise, would learn how to game the machines, writing merely what the machine is scanning for without actually putting much of substance or literary merit down on the paper.

The incidences of corruption you suffered academically are inexcusable, and I have my own tales I could tell, but I'm not sure our collective anecdotal evidence merits discarding the baby with the bathwater.

Ariella
Thinkernetter
Wednesday May 2, 2012 4:46:29 PM
no ratings

@mhhfive As an instructor, I've sometimes shared a student essay with the class, though due the defensive reaction one encounters, that is usually done for very good essays -- catching what they do right rather than pointing out what they do wrong. For the latter, you usually have to work with something not written by a student in the class because some people are so sensitive to criticism that they will not view it as constructive and may even feel publicly shamed. For a while it was popular to have students write in groups. Aside from the problems that Susan Cain points out with such set up in Quiet -- that groups stifle introverts altogether -- there could be the problem of the blind leading the blind because many may not be athe point when they can recognize what is correct or what makes a better constructed sentence or paragraph. Consequently, the teacher really has to direct things and point students in the right direction. That is difficult to do for each and every piece of writing in a class size that can easily exceed 20 students; in fact, it may be impossible given time constraints for the class.  So I can see the appeal of letting a computer take over the individualized response for the writing stages.

Ariella
Thinkernetter
Wednesday May 2, 2012 4:08:29 PM
no ratings

@Nicole, But what of the 9 publishers who passed on Harry Potter?  ertainly, to err is human, and that includes errors of assessment.That is not to say that a robo-reader would pick up on the fact that this story would spawn the biggest hit ever for children's books and films. In truth, these kinds of things are not altogether predictable because success does not depend on quality alone but on the convergence of favorable circumstances to bring the work to the public's attention at just the time when it has a taste for it.. It is possible that Harry Potter  may have had only modest sales if it had come out 20 years earlier, it would not have been the mega-hit it was. Steve Heley's  How I Became a Famous Novelist, which did not become that kind of hit, highlights the vagariesof the publishing industry and posits a writer gaming the system, as it were.


 

< Previous   Page 2 of 5   Next >
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Ariella Brown
Ariella Brown
Ariella Brown   5/14/2013   19 comments
Insurance company TV commercial aside, these days you don't hear many people saying, "I know it’s true. I saw it on the Internet."
most recent post: Ariella... @Susan you're welcome. 
Ariella Brown
Ariella Brown   6/27/2012   94 comments
The EU Commission, apparently, does not believe that there’s no such thing as bad publicity.
Ariella Brown
Ariella Brown   2/24/2012   43 comments
The day before Valentine’s Day, hundreds of thousands of people watched a video that featured the sentence “I love you” in 100 different languages. That video, widely shared on social networking sites, was made by Memrise, a learning site based in London. Languages are among the things you can learn through its memory techniques. And, unlike Rosetta Stone, the site is free.
Ariella Brown
Ariella Brown   1/10/2012   63 comments
“This is the largest classroom in the world, Professor -- television.” That’s what Charles Van Doren is told in the movie Quiz Show. And now, the potential for education assigned to television in the 1950s and described in that film is now found on the Internet.
5
of
Wisdom of the Big Chair
NFC Moves Into the Mainstream

3|20|13   |   2:16   |   No comments


While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Mitch Wagner
'Digital Nomads' Work From Anywhere & Everywhere

2|14|13   |   2:35   |   20 comments


New tools like laptops, tablets, smartphone, and wireless connectivity let us work from San Diego to Katmandu, and anywhere in between. But time management remains a problem.
Second Shooter
It's Not Tablets That Threaten the PC

2|13|13   |   2:21   |   8 comments


Blaming the PC's gloomy future on tablets is an oversimplification.
Wisdom of the Big Chair
Videoconferencing Sees Big Changes

2|1|13   |   2:08   |   4 comments


Vendors are dumping their videoconferencing hardware and transforming into software suppliers. Enterprises need to protect themselves.
Mary Maida
How Medtronic Overcomes Social Business Resistance

1|31|13   |   1:23   |   No comments


Showing results is the best way to win over social business doubters, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution's Mitch Wagner interviewed Maida at the E2 Innovate conference.
Alison Diana
Striking a Balance for Website Upgrades

1|24|13   |   1:59   |   3 comments


Companies need to take advantage of new technologies to simplify interfaces, improve capabilities, and enhance back-office processes. But they can't upgrade their Websites too often.
Wisdom of the Big Chair
Enterprise Network Performance: Shaky at Best

1|22|13   |   2:49   |   No comments


A recent survey by Endace found that 23% of companies experience some type of network problem daily and another 25% have a serious problem each month. Enterprise networks are still very unreliable and probably will continue to be in the near term.
Kelli Carlson-Jagersma
Wells Fargo Sales Get Social Business Boost

1|16|13   |   2:30   |   2 comments


Wells Fargo uses social software to replace email chains and help its sales team collaborate more effectively to land deals, according to Kelli Carlson-Jagersma, VP Collaboration Strategy for Wells Fargo. Mitch Wagner spoke with Carlson-Jagersma at the E2Innovate conference
Mary Maida
Medtronic Quantifies Social Business

1|9|13   |   1:15   |   No comments


The medical instruments manufacturer looks to metrics to quantify its social business engagement, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution editor in chief Mitch Wagner interviewed Maida at the E2 Innovate conference.
Reiter's Block
New Mobile Tech Lets Employees Do More With Less Power

1|8|13   |   3:04   |   8 comments


With the huge number of mobile devices available, IT departments need to consider how much computing power employees need, and in what form.
IETV: the thinkerNet on film
5
of
Kim Davis
Big-Data Can’t Always Sell Wine

5|21|13   |   2:23   |   3 comments


Whole Foods Global Wine Purchaser Doug Bell told me about some of the constraints on using analytics in the US wine market.
Paul J. Fleuranges
Digital Signage Keeps NYC Subway Straphangers on Track

5|6|13   |   3:51   |   No comments


New York's Metropolitan Transit Authority is conducting a pilot test of digital kiosks to guide subway users to where they want to go more efficiently and at lower cost.
Kim Davis
Fast Forward to the Future

4|23|13   |   2:29   |   20 comments


A look back at tech writing in the 90s makes us wonder where enterprise IT will be 20 years from now.
Mitch Wagner
Google Launches Its Most Depressing Service Yet

4|15|13   |   2:59   |   10 comments


Google's new Inactive Account Manager lets you control how Google disposes of your accounts when you die.
Second Shooter
Argument Over Top-Level Domains Is 'Stupid'

4|11|13   |   2:07   |   3 comments


The whole Amazon.reader debate is a double-stupid. It's stupid to think that there's any e-book buyer who doesn't know Amazon's URL, and it was stupider to let ICANN launch the whole free-form TLD initiative to start with.
Kim Davis
Ladies, Your Tablet Awaits

3|21|13   |   2:22   |   37 comments


ePad Femme is the world’s first tablet “made exclusively for women.”
Wisdom of the Big Chair
NFC Moves Into the Mainstream

3|20|13   |   2:16   |   No comments


While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Wisdom of the Big Chair
Integrating Security Into Your Cloud Contract

3|19|13   |   3:35   |   No comments


Enterprises would like to move to cloud computing but are hesitant because they are concerned about providers’ ability to secure company data. Here are some tips that help to ensure that if breaches occur, the business is not left holding the bag.
Brian Baron
How Edmunds.com Collects Customer Information

3|18|13   |   1:15   |   No comments


Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Brian Baron
How Edmunds.com Uses Analytics to Customize Site

3|14|13   |   0:47   |   No comments


The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
an IBM information resource
sponsored content
big blue blog
an IBM information resource
sponsored content
Expert Integrated Systems: Changing the Experience & Economics of IT
In this e-book, we take an in-depth look at these expert integrated systems -- what they are, how they work, and how they have the potential to help CIOs achieve dramatic savings while restoring IT's role as business innovator.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
Internet Evolution – not for thickies
Keep Critical Data With a Knowledge Management System
Taimoor Zubair
Fortune 500 companies lose at least
$31.5 billion a year by failing to share knowledge. A Knowledge Management System (KMS) can help companies significantly reduce these costs.

CLICK FOR MORE
M2M: Rise of the Machines? Not Yet
David Weldon
In the 1970 science fiction thriller
Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M.

CLICK FOR MORE
M2M: Rise of the Machines? Not Yet
David Weldon
In the 1970 science fiction thriller
Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M.

CLICK FOR MORE
M2M: Rise of the Machines? Not Yet
David Weldon
In the 1970 science fiction thriller
Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M.

CLICK FOR MORE
M2M: Rise of the Machines? Not Yet
David Weldon
In the 1970 science fiction thriller
Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M.

CLICK FOR MORE
Yahoo Needs to Break Tumblr in Order to Fix It
Joe Stanganelli
As
Mitch Wagner discussed today, Yahoo is acquiring Tumblr. The big Internet debate at the moment is whether Tumblr will be good or bad for Yahoo. Regardless of their stances on the future of Yahoo itself, many claim that Yahoo will somehow ruin Tumblr.

CLICK FOR MORE