Software can detect text
NEW YORK, Aug. 27 (UPI) -- Scottish researchers have developed
software based on Internet search engines that could help
investigators rifle quickly through complex documents involving such
controversial issues as nuclear power, genetically modified
organisms, airplane disasters, health care policy and weapons of
"These documents are often so long and complex that my techniques
are intended to help identify or sketch the arguments that they
contain, in a way that makes it easy to identify omissions or
contradictions," lead researcher Chris Johnson, a computer scientist
at the University of Glasgow, told United Press International.
For example, Johnson said his group has been contacted about the
current inquiry erupting in the United Kingdom over the death of
government weapons adviser David Kelly, who wrote the initial report
on weapons of mass destruction in Iraq last year.
"The first set of evidence released on the case is 9,000 pages
long," he explained. "So if you want to work out how those 9,000
pages contribute to arguments about weapons of mass destruction and
the government's policy over Iraq, you manually can't do it.
Computer-based tools will help you spot critical topics of
importance in the document."
The software Johnson designed originally was intended to inspect
accident reports, such as those from the probe into the loss of the
Concorde, the supersonic airliner that crashed in Paris in July
2000, killing 113. But it also has found use among legal teams
investigating accidents in confidential cases. Johnson said the
software's capability to distill arguments in complex records and
root out bias has further implications -- for instance, to analyze
the news for spin.
"It is often argued that the press speculate too much in the
aftermath of an accident," he said. This can misinform the public
and oversimplify or distort complex issues. So Johnson extended his
work to look at the coverage of the Concorde crash as presented by
the London Times, the British tabloid newspaper The Sun, and the BBC
"I was able to show that the press did not speculate very much at
all," he said. Overall, most of the conjecture in each news source
was presented in the form of direct quotations from experts rather
than from speculation from journalists.
The software looks for the conclusions presented in such reports
and scans the documents for any related details. In this manner, the
software works much like Internet search engines such as Google,
which comb the Web for keywords.
"For instance, if somebody were to say an accident was caused by
human error, (the software) would go back and look for references to
human error," Johnson said.
In the end, the software charts out its findings. It displays its
own conclusions in a box, the related arguments in an adjoining box,
and evidence in yet another box. Using this method, Johnson recently
teased out inconsistencies in an 80-page document discussing the
failure of London's computerized ambulance dispatch system. Though
the report's conclusion blamed the breakdown on inadequate testing
of the system, the software revealed references throughout the
report to extensive testing.
"A lot of my work is done with lawyers. If you're involved in a
large case, one of the techniques of the opposing side is to provide
so much supporting evidence at the last minute that you haven't got
time or money to go through it all. What my techniques will do for
lawyers is to ... show how it all fits together," Johnson said.
Up to now, most of the cases Johnson has studied using the
software have involved accidents. "The important thing is to find
out whether conclusions or recommendations are made without
(supporting) evidence," he explained. "Many times the way people
make arguments is determined by rhetoric rather than by logic
For instance, he continued, "people will say fact 'A' and 'A' and
'A' and 'A' and slip in fact 'B,' which is inconvenient to their
argument," in the hopes fact "B" will be overlooked. Johnson
recalled such an example in an accident report that did not explain
why the accident happened until the final page, where it revealed
the victims had been drinking.
"That's something suitable for crime thrillers, but not really
suitable if you're writing engineering reports," Johnson said.
In rooting through accident reports, the software looks at
frequencies of key terms such as "fatigue" or "workload." In cases
where a word's meaning could be ambiguous -- such as "worker
fatigue" vs. "metal fatigue" -- the software assigns code numbers to
different meanings of each word and looks at the frequencies of
related phrases that could give clues to context, such as "operator
"It's not rocket science, not very advanced software," Johnson
said. "There is nothing innovative about these search techniques.
Much of my work ought really be done inside federal or international
agencies. Unfortunately, good software engineers are a scarce
resource. They therefore often lack the people required to try some
of the more novel techniques on their data sources."
John McDermid, a software and safety engineer at the University
of York in Britain, said although the economic impact of Johnson's
technique might be difficult to predict, its potential seems
"Imagine an accident where you lost an aircraft," McDermid told
UPI. "You don't get the right cause for what happened, so you fix
the wrong thing." But if the software flags inconsistencies in an
accident report, it might help investigators uncover the real
problem. "So in avoiding accidents, you're talking hundreds of
millions of dollars," he said. "The potential is for a very large
Johnson said he plans to develop a series of software tools to
find patterns of failure in accident reports collected by several
different countries "to make sure that similar failures are not
going undetected. For instance, I think there are many common
features in recent health care incidents in both the United States
and the United Kingdom involving telemedical applications and
end-user medication systems -- for example, for helping control
Still, Johnson noted his software requires human intervention to
verify computational findings. "It is possible that the system will
miss details, and so it is only a tool -- not a substitute -- for
manual inspection," he said.