Cornell Researchers Work to Spot Fake Reviews

September 23, 2011 12:00 am0 comments
Emma Court

Four Cornell researchers developed a method for detecting fake online hotel reviews, using an algorithm to identify language features specific to fake and truthful reviews. The algorithm accurately exposed fake reviews 90 percent of the time, according to their report.
The researchers used a machine-learning algorithm, which they developed with examples of truthful and deceptive reviews from a database of 400 truthful and 400 deceptive reviews. The algorithm subsequently learned how to distinguish between the two types of reviews for future use.
The researchers also tested three human judges’ ability to identify the real reviews, finding that humans were more likely to classify an opinion as truthful than deceptive, according to the research.
The research revealed that humans often focus on unreliable cues to deception. Of the three undergraduate students acting as judges, the most accurate judge was correct 61.9 percent. The other two judges didn’t score statistically above chance. In contrast, the algorithm was nearly 90 percent accurate in identifying fake reviews.
“[This] shows two things. One, our fake reviews are really convincing, and two, humans are really bad at detecting deception,” said Myle Ott grad, lead author of the research.
The algorithm analyzed the language of the reviews, looking at their composition and identifying patterns among the deceptive reviews.
“The cool thing about these algorithms is that we can actually examine the learned model and see what features it’s relying on when making predictions,” Ott said.
Truthful reviews emphasized more spatial details such as whether the bathroom was big or small, and talked about price, according to Ott. In general, these reviews used more punctuation, and used more nouns, prepositions and adjectives, he said.
The authors of deceptive reviews focused more on whom they were with on the trip and why they were traveling. Those deceptive reviews also saw more use of the first-person singular and used more verbs, pronouns, adverbs and superlative adjectives, he said.
“Just knowing the composition of parts of speech in the reviews themselves, the system was 73 percent accurate in detecting deception,” Ott said.
Prof. Jeff Hancock, communications, a co-author of the report, added that it is difficult for humans to identify fake reviews because the aspects researchers look at “are things that people typically ignore: things like pronouns, spatial information like prepositions, those little words we just tend to ignore.”
He said that there is “a long history in psychology that shows that people are really bad at detecting deception — about 40 years of psychological and communication research. It’s shown that when people are given a choice if this is deception or not, they perform at chance.”
To train the algorithm, the researchers created a database of reviews consisting of 20 truthful and 20 deceptive reviews for 20 hotels. The 400 total truthful reviews were taken from the 20 most popular Chicago hotels on TripAdvisor, because researchers believed those hotels would be less likely to ask for fake reviews. The 400 total fake reviews were created through a crowdsourcing site called Amazon Mechanical Turk.
The researchers are looking at ways in which the algorithm can be used to identify other kinds of fake online reviews besides hotel reviews. Ott said the research has only brought up more questions about new approaches to use with the algorithm.
“Does it work for other kinds of reviews? For hotels in different areas? Does it work for restaurants, books, products?” he asked.
According to Hancock, the research is already being put into practical application. Many companies have approached the researchers to get access to their algorithm to make their online reviews more trustworthy, he said.