The Macrosite for News, Analysis and Opinion about the Future of the Internet
Ariella Brown

Automated Systems 'Grade' Written Work

Written by Ariella Brown
5/2/2012 41 comments
DISCUSS     Email This

Can a software program effectively replace human essay readers?

Answers to that question vary.

A competition sponsored by the William and Flora Hewlett Foundation recently put up $100,000 in three prizes to discover a program that performs as well as human scorers in evaluating written essays. The competition, which was posted on Kaggle, a global crowdsourcing and collaboration Website for predictive modeling experts, drew 258 players in 159 teams.

The three-person team of “SirGuessalot & PlanetThanet & Stefan” (aka Momchil Georgiev), Jason Tigge, and Sefan Henß) arrived at a system the judges found closest to human reader results. They won the foundation’s first-place award of $60,000. Second place winners will be awarded $30,000 and third place winners, $10,000.

Separately, data scientist Ben Hammer partnered with Mark Shermis, the dean of the University of Akron's College of Education, to research and write “Contrasting State-of-the-Art Automated Scoring of Essays: Analysis.” Their research, also funded by the Hewlett Foundation, leads them to conclude that “the automated essay scoring engines performed quite well.”

Both examples showcase the quest for so-called automated assessment systems to “grade” or judge written work. Some regard this effort as a sign of real progress. In the view of Steve Graham, a professor at Vanderbilt University, humans are not very good at objective assessment.

That is also what Leonard Mlodinow suggests in his book The Drunkard’s Walk: How Randomness Rules Our Lives (Pantheon Books, 2008). Mlodinow recounts his dismay at a 93, the score his son’s high school teacher put on the paper that he -- a published writer -- had rewritten. He attributes the missing points to teacher fallibility, contending that “a teacher’s assessment, like any measurement, is susceptible to random variance and error” (p. 126).

Other people are appalled at the prospect of a machine assessing human writing. Les Pearlman, the director of writing at MIT, falls into that camp. He finds the e-Rater automated scoring system from nonprofit group ETS seriously flawed because it “can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.” Pearlman shows that essays that include patently false statements can still earn perfect scores on e-Rater.

It is only fair to point out that it is possible to “game” a human monitor, as well. For instance, students have observed that the key to getting an A from a certain teacher is to include a PowerPoint presentation.

The problem of incorrect statements is not unique to e-Rater either. As a regular scorer for the SAT essay, I have long ago internalized that I am not supposed to hold statements like “Albert Einstein invented the lightbulb” against the student. The rationale behind that is that we are assessing the students’ ability to develop and support a point of view -- not how well they know the history, literature, or science they refer to. If that’s a flaw, it exists in tests scored by humans as well as by machines.

Imperfect though they may be, automated assessment systems are not only on the way, they are already here. As Hammer and Shermis’s analysis quoted above points out, automated systems currently take the place of a human as second reader “for high stakes assessment in several general tests (e.g., TOEFL, GMAT) and... for some licensing exams (e.g., AICPA).”

As a result of the Hewlett competition, it is possible that even more exams will be scored by automated assessment systems. Some may regard that as a blessing, but others as a curse. It certainly has kicked up quite a bit of debate. What do you think?

Related posts:

— Ariella Brown is a freelance writer, editor, and social media consultant.

DISCUSS     Email This
Current display:       newest comments first       display in chronological order
Page 1 of 5   Next >
Ariella
Thinkernetter
Monday June 25, 2012 1:47:36 PM
no ratings

Kaggle just posted a contest for automated scoring of short answers: https://www.kaggle.com/c/asap-sasThis one offers cash prizes for up to 5 place winners, splitting the pot of $100K from $50K down to $2,500. You have until September 5th to enter.

Kim Davis
Thinkernetter
Tuesday June 12, 2012 4:28:54 PM
no ratings

Thanks for the links, Ariella.  Yes, I can accept that very formulaic texts like news stories might well be straightforward to automate.

Ariella
Thinkernetter
Tuesday June 12, 2012 4:02:43 PM
no ratings

@Kim As you touched on the question of computers composing, I thought I'd share Gini Dietrich's articles Can an Algorithm Write a Better News Story than Humans? While she is skeptical in part 1, she crosses over to the other side in part 2

Ariella
Thinkernetter
Friday May 4, 2012 2:39:24 PM
no ratings

@jabailo Van Gogh's paintings sell for millions today, but in his own day, he only sold a single painting out of the 900 plus he painted: "Red Vineyard at Arles." Timing is everything for recognition and success. 

jabailo
IQ Crew
Friday May 4, 2012 2:13:02 PM
no ratings

Also, there are plenty of humans who are musically gifted, and who train as classical composers and who produce work -- and yet they are never listened to.  Likewise, some form bands, produce great music...and sell a few hundred mp3s, if that.

Just "being good" at something is no guarantee of success...there's a whole complete ethos that has to be timed just write to make a Hit!   For example, go to any art school and you may find people who can paint just as good as Van Gogh.   How much are their paintings worth?

 

Ariella
Thinkernetter
Friday May 4, 2012 9:42:43 AM
no ratings

@mhhfive, @jabailo One of the contests current on Kaggle now is: Predict which songs a user will listen to. It says: " Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling, even human oracles!* " That bit is explained as: "* This contest is for computer models, but if you manage to get recommendations from humans for 110K listeners, we'd like to know how!"

As for actually composing music, I think it is possible because the musical notes can be translated into mathematical representation. It may be possible to do the same with colors to arrive at computer generated art. I can even envision a museum devoted to that -- or at least a website.  However, I am highly doubtful that any computer generated masterpiece will every fetch anywhere near what human masterpieces do because we value art as human expression, and without that component you may just have a pretty picture.


jabailo
IQ Crew
Thursday May 3, 2012 11:55:09 PM
no ratings

Netflix has algorithms that are supposed to predict films that I might like.


They occasionally present a choice film, but all too often they seem to basically pander to the last few of my own choices.   In the end, its sort of the like the Christmas present you get from your grandparents, because they "thought you would like it".

 

mhhfive
IQ Crew
Thursday May 3, 2012 8:02:33 PM
no ratings

There are already algorithms that can predict which songs will become pop hits. So software that could compose pop hit music doesn't seem implausible. There are also some algorithms that have created some really nice paintings... so it's going to be harder for people to come up with really unique art someday..?

Ariella
Thinkernetter
Thursday May 3, 2012 5:22:27 PM
no ratings

@Kim Then what do you think of the works of Hemingway?Stein had quite an influence on him.

In any case, no one seems to advance the automated readers for creative writing. But I suppose they could be used in proofreading -- to catch missing words and such mistakes that sometimes creep into published books.

Kim Davis
Thinkernetter
Thursday May 3, 2012 5:14:32 PM
no ratings

Just to be pedantic, I prefer my example.  Gertrude Stein's writing seems to exemplify the qualities of the second-rate and the amateurish.  Critics would say there's a reason for that!  I think it's hard for humans - and certainly would be for machines - to distinguish quality in her prose.

Americans are very friendly and very suspicious, that is what Americans are and that is what always upsets the foreigner, who deals with them, they are so friendly how can they be so suspicious they are so suspicious how can they be so friendly but they just are.

Joyce, like Beckett, has a distinct air of aesthetic superiority about him.

Page 1 of 5   Next >
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Ariella Brown
Ariella Brown
Ariella Brown   5/14/2013   16 comments
Insurance company TV commercial aside, these days you don't hear many people saying, "I know it’s true. I saw it on the Internet."
Ariella Brown
Ariella Brown   6/27/2012   94 comments
The EU Commission, apparently, does not believe that there’s no such thing as bad publicity.
Ariella Brown
Ariella Brown   2/24/2012   43 comments
The day before Valentine’s Day, hundreds of thousands of people watched a video that featured the sentence “I love you” in 100 different languages. That video, widely shared on social networking sites, was made by Memrise, a learning site based in London. Languages are among the things you can learn through its memory techniques. And, unlike Rosetta Stone, the site is free.
Ariella Brown
Ariella Brown   1/10/2012   63 comments
“This is the largest classroom in the world, Professor -- television.” That’s what Charles Van Doren is told in the movie Quiz Show. And now, the potential for education assigned to television in the 1950s and described in that film is now found on the Internet.
5
of
Wisdom of the Big Chair
NFC Moves Into the Mainstream

3|20|13   |   2:16   |   No comments


While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Mitch Wagner
'Digital Nomads' Work From Anywhere & Everywhere

2|14|13   |   2:35   |   20 comments


New tools like laptops, tablets, smartphone, and wireless connectivity let us work from San Diego to Katmandu, and anywhere in between. But time management remains a problem.
Second Shooter
It's Not Tablets That Threaten the PC

2|13|13   |   2:21   |   8 comments


Blaming the PC's gloomy future on tablets is an oversimplification.
Wisdom of the Big Chair
Videoconferencing Sees Big Changes

2|1|13   |   2:08   |   4 comments


Vendors are dumping their videoconferencing hardware and transforming into software suppliers. Enterprises need to protect themselves.
Mary Maida
How Medtronic Overcomes Social Business Resistance

1|31|13   |   1:23   |   No comments


Showing results is the best way to win over social business doubters, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution's Mitch Wagner interviewed Maida at the E2 Innovate conference.
Alison Diana
Striking a Balance for Website Upgrades

1|24|13   |   1:59   |   3 comments


Companies need to take advantage of new technologies to simplify interfaces, improve capabilities, and enhance back-office processes. But they can't upgrade their Websites too often.
Wisdom of the Big Chair
Enterprise Network Performance: Shaky at Best

1|22|13   |   2:49   |   No comments


A recent survey by Endace found that 23% of companies experience some type of network problem daily and another 25% have a serious problem each month. Enterprise networks are still very unreliable and probably will continue to be in the near term.
Kelli Carlson-Jagersma
Wells Fargo Sales Get Social Business Boost

1|16|13   |   2:30   |   2 comments


Wells Fargo uses social software to replace email chains and help its sales team collaborate more effectively to land deals, according to Kelli Carlson-Jagersma, VP Collaboration Strategy for Wells Fargo. Mitch Wagner spoke with Carlson-Jagersma at the E2Innovate conference
Mary Maida
Medtronic Quantifies Social Business

1|9|13   |   1:15   |   No comments


The medical instruments manufacturer looks to metrics to quantify its social business engagement, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution editor in chief Mitch Wagner interviewed Maida at the E2 Innovate conference.
Reiter's Block
New Mobile Tech Lets Employees Do More With Less Power

1|8|13   |   3:04   |   8 comments


With the huge number of mobile devices available, IT departments need to consider how much computing power employees need, and in what form.
IETV: the thinkerNet on film
5
of
Paul J. Fleuranges
Digital Signage Keeps NYC Subway Straphangers on Track

5|6|13   |   3:51   |   No comments


New York's Metropolitan Transit Authority is conducting a pilot test of digital kiosks to guide subway users to where they want to go more efficiently and at lower cost.
Kim Davis
Fast Forward to the Future

4|23|13   |   2:29   |   20 comments


A look back at tech writing in the 90s makes us wonder where enterprise IT will be 20 years from now.
Mitch Wagner
Google Launches Its Most Depressing Service Yet

4|15|13   |   2:59   |   10 comments


Google's new Inactive Account Manager lets you control how Google disposes of your accounts when you die.
Second Shooter
Argument Over Top-Level Domains Is 'Stupid'

4|11|13   |   2:07   |   3 comments


The whole Amazon.reader debate is a double-stupid. It's stupid to think that there's any e-book buyer who doesn't know Amazon's URL, and it was stupider to let ICANN launch the whole free-form TLD initiative to start with.
Kim Davis
Ladies, Your Tablet Awaits

3|21|13   |   2:22   |   37 comments


ePad Femme is the world’s first tablet “made exclusively for women.”
Wisdom of the Big Chair
NFC Moves Into the Mainstream

3|20|13   |   2:16   |   No comments


While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Wisdom of the Big Chair
Integrating Security Into Your Cloud Contract

3|19|13   |   3:35   |   No comments


Enterprises would like to move to cloud computing but are hesitant because they are concerned about providers’ ability to secure company data. Here are some tips that help to ensure that if breaches occur, the business is not left holding the bag.
Brian Baron
How Edmunds.com Collects Customer Information

3|18|13   |   1:15   |   No comments


Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Brian Baron
How Edmunds.com Uses Analytics to Customize Site

3|14|13   |   0:47   |   No comments


The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Second Shooter
Locked Handsets Aren't the Problem – Subsidies Are the Problem

3|13|13   |   2:09   |   10 comments


Subsidized handsets, rather than locked handsets, should be the focus of regulators. We're not getting good deals, not fostering innovation, and weakening our power as buyers.
an IBM information resource
sponsored content
big blue blog
Todd Watson
Todd Watson   5/17/2013   Post a comment
It's been 17 years since I've visited the city of Dublin, but I still have some very distinct impressions from my one and only visit.
an IBM information resource
sponsored content
Expert Integrated Systems: Changing the Experience & Economics of IT
In this e-book, we take an in-depth look at these expert integrated systems -- what they are, how they work, and how they have the potential to help CIOs achieve dramatic savings while restoring IT's role as business innovator.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
Internet Evolution – not for thickies
Keep Critical Data With a Knowledge Management System
Taimoor Zubair
Fortune 500 companies lose at least
$31.5 billion a year by failing to share knowledge. A Knowledge Management System (KMS) can help companies significantly reduce these costs.

CLICK FOR MORE
IT Suffers From Obama Admin's Jekyll & Hyde Approach to Privacy Rights
Ron Miller
Recently, the Obama administration has been of two minds where privacy rights are concerned. On one hand, you have an administration that vowed to
veto CISPA and mandated open data for government websites. On the other hand, you have an increasingly out-of-control Department of Justice on a fishing expedition at AP and demanding legislation to let the FBI wiretap private, encrypted communications and levy fines if a company fails to comply.

CLICK FOR MORE
IT Suffers From Obama Admin's Jekyll & Hyde Approach to Privacy Rights
Ron Miller
Recently, the Obama administration has been of two minds where privacy rights are concerned. On one hand, you have an administration that vowed to
veto CISPA and mandated open data for government websites. On the other hand, you have an increasingly out-of-control Department of Justice on a fishing expedition at AP and demanding legislation to let the FBI wiretap private, encrypted communications and levy fines if a company fails to comply.

CLICK FOR MORE
IT Suffers From Obama Admin's Jekyll & Hyde Approach to Privacy Rights
Ron Miller
Recently, the Obama administration has been of two minds where privacy rights are concerned. On one hand, you have an administration that vowed to
veto CISPA and mandated open data for government websites. On the other hand, you have an increasingly out-of-control Department of Justice on a fishing expedition at AP and demanding legislation to let the FBI wiretap private, encrypted communications and levy fines if a company fails to comply.

CLICK FOR MORE
Websites Should Consider Tougher ID Verification Policies
Alan Reiter
The apartment and house sharing service,
Airbnb, now requires members to verify their identities by demonstrating a presence on the web, and by either scanning a government ID or entering detailed personal details. Other enterprises should take a close look at Airbnb's verification policies.

CLICK FOR MORE