For a fully formatted version go here

Software can detect text inconsistencies

By Charles Choi UPI Science News

NEW YORK, Aug. 27 (UPI) -- Scottish researchers have developed software based on Internet search engines that could help investigators rifle quickly through complex documents involving such controversial issues as nuclear power, genetically modified organisms, airplane disasters, health care policy and weapons of mass destruction.

"These documents are often so long and complex that my techniques are intended to help identify or sketch the arguments that they contain, in a way that makes it easy to identify omissions or contradictions," lead researcher Chris Johnson, a computer scientist at the University of Glasgow, told United Press International.

For example, Johnson said his group has been contacted about the current inquiry erupting in the United Kingdom over the death of government weapons adviser David Kelly, who wrote the initial report on weapons of mass destruction in Iraq last year.

"The first set of evidence released on the case is 9,000 pages long," he explained. "So if you want to work out how those 9,000 pages contribute to arguments about weapons of mass destruction and the government's policy over Iraq, you manually can't do it. Computer-based tools will help you spot critical topics of importance in the document."

The software Johnson designed originally was intended to inspect accident reports, such as those from the probe into the loss of the Concorde, the supersonic airliner that crashed in Paris in July 2000, killing 113. But it also has found use among legal teams investigating accidents in confidential cases. Johnson said the software's capability to distill arguments in complex records and root out bias has further implications -- for instance, to analyze the news for spin. "It is often argued that the press speculate too much in the aftermath of an accident," he said. This can misinform the public and oversimplify or distort complex issues. So Johnson extended his work to look at the coverage of the Concorde crash as presented by the London Times, the British tabloid newspaper The Sun, and the BBC Online.

"I was able to show that the press did not speculate very much at all," he said. Overall, most of the conjecture in each news source was presented in the form of direct quotations from experts rather than from speculation from journalists.

The software looks for the conclusions presented in such reports and scans the documents for any related details. In this manner, the software works much like Internet search engines such as Google, which comb the Web for keywords.

"For instance, if somebody were to say an accident was caused by human error, (the software) would go back and look for references to human error," Johnson said.

In the end, the software charts out its findings. It displays its own conclusions in a box, the related arguments in an adjoining box, and evidence in yet another box. Using this method, Johnson recently teased out inconsistencies in an 80-page document discussing the failure of London's computerized ambulance dispatch system. Though the report's conclusion blamed the breakdown on inadequate testing of the system, the software revealed references throughout the report to extensive testing.

"A lot of my work is done with lawyers. If you're involved in a large case, one of the techniques of the opposing side is to provide so much supporting evidence at the last minute that you haven't got time or money to go through it all. What my techniques will do for lawyers is to ... show how it all fits together," Johnson said.

Up to now, most of the cases Johnson has studied using the software have involved accidents. "The important thing is to find out whether conclusions or recommendations are made without (supporting) evidence," he explained. "Many times the way people make arguments is determined by rhetoric rather than by logic alone."

For instance, he continued, "people will say fact 'A' and 'A' and 'A' and 'A' and slip in fact 'B,' which is inconvenient to their argument," in the hopes fact "B" will be overlooked. Johnson recalled such an example in an accident report that did not explain why the accident happened until the final page, where it revealed the victims had been drinking.

"That's something suitable for crime thrillers, but not really suitable if you're writing engineering reports," Johnson said. In rooting through accident reports, the software looks at frequencies of key terms such as "fatigue" or "workload." In cases where a word's meaning could be ambiguous -- such as "worker fatigue" vs. "metal fatigue" -- the software assigns code numbers to different meanings of each word and looks at the frequencies of related phrases that could give clues to context, such as "operator error."

"It's not rocket science, not very advanced software," Johnson said. "There is nothing innovative about these search techniques. Much of my work ought really be done inside federal or international agencies. Unfortunately, good software engineers are a scarce resource. They therefore often lack the people required to try some of the more novel techniques on their data sources."

John McDermid, a software and safety engineer at the University of York in Britain, said although the economic impact of Johnson's technique might be difficult to predict, its potential seems significant.

"Imagine an accident where you lost an aircraft," McDermid told UPI. "You don't get the right cause for what happened, so you fix the wrong thing." But if the software flags inconsistencies in an accident report, it might help investigators uncover the real problem. "So in avoiding accidents, you're talking hundreds of millions of dollars," he said. "The potential is for a very large payback."

Johnson said he plans to develop a series of software tools to find patterns of failure in accident reports collected by several different countries "to make sure that similar failures are not going undetected. For instance, I think there are many common features in recent health care incidents in both the United States and the United Kingdom involving telemedical applications and end-user medication systems -- for example, for helping control diabetes."

Still, Johnson noted his software requires human intervention to verify computational findings. "It is possible that the system will miss details, and so it is only a tool -- not a substitute -- for manual inspection," he said.

All site contents copyright © 2003 News World Communications, Inc. Privacy Policy