Can a software program effectively replace human essay readers?
Answers to that question vary.
A competition sponsored by the William and Flora Hewlett Foundation recently put up $100,000 in three prizes to discover a program that performs as well as human scorers in evaluating written essays. The competition, which was posted on Kaggle, a global crowdsourcing and collaboration Website for predictive modeling experts, drew 258 players in 159 teams.
The three-person team of “SirGuessalot & PlanetThanet & Stefan” (aka Momchil Georgiev), Jason Tigge, and Sefan Henß) arrived at a system the judges found closest to human reader results. They won the foundation’s first-place award of $60,000. Second place winners will be awarded $30,000 and third place winners, $10,000.
Separately, data scientist Ben Hammer partnered with Mark Shermis, the dean of the University of Akron's College of Education, to research and write “Contrasting State-of-the-Art Automated Scoring of Essays: Analysis.” Their research, also funded by the Hewlett Foundation, leads them to conclude that “the automated essay scoring engines performed quite well.”
Both examples showcase the quest for so-called automated assessment systems to “grade” or judge written work. Some regard this effort as a sign of real progress. In the view of Steve Graham, a professor at Vanderbilt University, humans are not very good at objective assessment.
That is also what Leonard Mlodinow suggests in his book The Drunkard’s Walk: How Randomness Rules Our Lives (Pantheon Books, 2008). Mlodinow recounts his dismay at a 93, the score his son’s high school teacher put on the paper that he -- a published writer -- had rewritten. He attributes the missing points to teacher fallibility, contending that “a teacher’s assessment, like any measurement, is susceptible to random variance and error” (p. 126).
Other people are appalled at the prospect of a machine assessing human writing. Les Pearlman, the director of writing at MIT, falls into that camp. He finds the e-Rater automated scoring system from nonprofit group ETS seriously flawed because it “can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.” Pearlman shows that essays that include patently false statements can still earn perfect scores on e-Rater.
It is only fair to point out that it is possible to “game” a human monitor, as well. For instance, students have observed that the key to getting an A from a certain teacher is to include a PowerPoint presentation.
The problem of incorrect statements is not unique to e-Rater either. As a regular scorer for the SAT essay, I have long ago internalized that I am not supposed to hold statements like “Albert Einstein invented the lightbulb” against the student. The rationale behind that is that we are assessing the students’ ability to develop and support a point of view -- not how well they know the history, literature, or science they refer to. If that’s a flaw, it exists in tests scored by humans as well as by machines.
Imperfect though they may be, automated assessment systems are not only on the way, they are already here. As Hammer and Shermis’s analysis quoted above points out, automated systems currently take the place of a human as second reader “for high stakes assessment in several general tests (e.g., TOEFL, GMAT) and... for some licensing exams (e.g., AICPA).”
As a result of the Hewlett competition, it is possible that even more exams will be scored by automated assessment systems. Some may regard that as a blessing, but others as a curse. It certainly has kicked up quite a bit of debate. What do you think?
Kaggle just posted a contest for automated scoring of short answers: https://www.kaggle.com/c/asap-sasThis one offers cash prizes for up to 5 place winners, splitting the pot of $100K from $50K down to $2,500. You have until September 5th to enter.
@jabailo Van Gogh's paintings sell for millions today, but in his own day, he only sold a single painting out of the 900 plus he painted: "Red Vineyard at Arles." Timing is everything for recognition and success.
Also, there are plenty of humans who are musically gifted, and who train as classical composers and who produce work -- and yet they are never listened to. Likewise, some form bands, produce great music...and sell a few hundred mp3s, if that.
Just "being good" at something is no guarantee of success...there's a whole complete ethos that has to be timed just write to make a Hit! For example, go to any art school and you may find people who can paint just as good as Van Gogh. How much are their paintings worth?
@mhhfive, @jabailo One of the contests current on Kaggle now is: Predict which songs a user will listen to. It says: "Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling, even human oracles!* " That bit is explained as: "* This contest is for computer models, but if you manage to get recommendations from humans for 110K listeners, we'd like to know how!"
As for actually composing music, I think it is possible because the musical notes can be translated into mathematical representation. It may be possible to do the same with colors to arrive at computer generated art. I can even envision a museum devoted to that -- or at least a website. However, I am highly doubtful that any computer generated masterpiece will every fetch anywhere near what human masterpieces do because we value art as human expression, and without that component you may just have a pretty picture.
Netflix has algorithms that are supposed to predict films that I might like.
They occasionally present a choice film, but all too often they seem to basically pander to the last few of my own choices. In the end, its sort of the like the Christmas present you get from your grandparents, because they "thought you would like it".
There are already algorithms that can predict which songs will become pop hits. So software that could compose pop hit music doesn't seem implausible. There are also some algorithms that have created some really nice paintings... so it's going to be harder for people to come up with really unique art someday..?
@Kim Then what do you think of the works of Hemingway?Stein had quite an influence on him.
In any case, no one seems to advance the automated readers for creative writing. But I suppose they could be used in proofreading -- to catch missing words and such mistakes that sometimes creep into published books.
Just to be pedantic, I prefer my example. Gertrude Stein's writing seems to exemplify the qualities of the second-rate and the amateurish. Critics would say there's a reason for that! I think it's hard for humans - and certainly would be for machines - to distinguish quality in her prose.
Americans are very friendly and very suspicious, that is what Americans are and that is what always upsets the foreigner, who deals with them, they are so friendly how can they be so suspicious they are so suspicious how can they be so friendly but they just are.
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
The day before Valentine’s Day, hundreds of thousands of people watched a video that featured the sentence “I love you” in 100 different languages. That video, widely shared on social networking sites, was made by Memrise, a learning site based in London. Languages are among the things you can learn through its memory techniques. And, unlike Rosetta Stone, the site is free.
“This is the largest classroom in the world, Professor -- television.” That’s what Charles Van Doren is told in the movie Quiz Show. And now, the potential for education assigned to television in the 1950s and described in that film is now found on the Internet.
While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
New tools like laptops, tablets, smartphone, and wireless connectivity let us work from San Diego to Katmandu, and anywhere in between. But time management remains a problem.
Showing results is the best way to win over social business doubters, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution's Mitch Wagner interviewed Maida at the E2 Innovate conference.
Companies need to take advantage of new technologies to simplify interfaces, improve capabilities, and enhance back-office processes. But they can't upgrade their Websites too often.
A recent survey by Endace found that 23% of companies experience some type of network problem daily and another 25% have a serious problem each month. Enterprise networks are still very unreliable and probably will continue to be in the near term.
Wells Fargo uses social software to replace email chains and help its sales team collaborate more effectively to land deals, according to Kelli Carlson-Jagersma, VP Collaboration Strategy for Wells Fargo. Mitch Wagner spoke with Carlson-Jagersma at the E2Innovate conference
The medical instruments manufacturer looks to metrics to quantify its social business engagement, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution editor in chief Mitch Wagner interviewed Maida at the E2 Innovate conference.
New York's Metropolitan Transit Authority is conducting a pilot test of digital kiosks to guide subway users to where they want to go more efficiently and at lower cost.
The whole Amazon.reader debate is a double-stupid. It's stupid to think that there's any e-book buyer who doesn't know Amazon's URL, and it was stupider to let ICANN launch the whole free-form TLD initiative to start with.
While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Enterprises would like to move to cloud computing but are hesitant because they are concerned about providers’ ability to secure company data. Here are some tips that help to ensure that if breaches occur, the business is not left holding the bag.
Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Expert Integrated Systems: Changing the Experience & Economics of IT In this e-book, we take an in-depth look at these expert integrated systems -- what they are, how they work, and how they have the potential to help CIOs achieve dramatic savings while restoring IT's role as business innovator. READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE! REGISTER HERE
Wanted! Site Moderators Internet Evolution is looking for a handful of readers to help moderate the message boards on our site as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?
To save this item to your list of favorite Internet Evolution content so you can find it later in your Profile page, click the "Save It" button next to the item.
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE