Wednesday, October 04, 2006

DHS Data Mining Efforts Under Scrutiny

Two news pieces in one day about the Department of Homeland Security's data mining programs. Where there is smoke, there usually is fire.

From GovExec:

Congressional appropriators have directed the Homeland Security Department's inspector general to investigate one of the department's data-mining projects, saying it appears to lack clear guidelines and oversight.

In the fiscal 2007 Homeland Security spending bill -- expected to be signed by President Bush Wednesday -- lawmakers cite concerns over the department's Analysis, Dissemination, Visualization, Insight and Semantic Enhancement (ADVISE) program.

"A prototype is currently available to analysts in [the Homeland Security] Intelligence and Analysis [unit] using departmental and other data, including some on U.S. citizens," lawmakers wrote. "The ADVISE program plan, total costs and privacy impacts are unclear and therefore the conferees direct the inspector general to conduct a comprehensive program review and report within nine months of enactment of this act."

The department has spent about $40 million on the project, lawmakers added.

Critics fear that ADVISE might intrude on the privacy rights of U.S. citizens, especially by trolling their e-mails and blogs. House Appropriations ranking member David Obey, D-Wis., and Homeland Security Appropriations Subcommittee ranking member Martin Olav Sabo, D-Minn., asked the Government Accountability Office earlier this year to investigate the program.

"We've been long concerned about how the department treats Americans' privacy and due process rights," Sabo said during a May markup of the Homeland Security spending bill. "ADVISE appears to be a new variation on the highly controversial Defense Department Total Information Awareness program that was supposed to be terminated in 2003."

A Homeland Security Department spokesman said ADVISE is not yet an active program. When complete, he added, ADVISE will "deliver technology or a set of technologies to provide the capability to connect the dots" of intelligence, a need cited by the 9/11 Commission.

"It extracts important relationships and correlations from a wealth of data and produces actionable intelligence," he said. "What it does perform is data integration at a large scale."

But he would not describe the specific type of data collected through the program.

Today's New York Times addresses another one of the prospective DHS data mining programs:

A consortium of major universities, using Homeland Security Department money, is developing software that would let the government monitor negative opinions of the United States or its leaders in newspapers and other publications overseas.

Such a "sentiment analysis" is intended to identify potential threats to the nation, security officials said.

Researchers at institutions including Cornell, the University of Pittsburgh and the University of Utah intend to test the system on hundreds of articles published in 2001 and 2002 on topics like President Bush's use of the term "axis of evil," the handling of detainees at Guantanamo Bay, the debate over global warming and the coup attempt against President Hugo Chavez of Venezuela.

A $2.4 million grant will finance the research over three years.

American officials have long relied on newspapers and other news sources to track events and opinions here and abroad, a goal that has included the routine translation of articles from many foreign publications and news services.

The new software would allow much more rapid and comprehensive monitoring of the global news media, as the Homeland Security Department and, perhaps, intelligence agencies look "to identify common patterns from numerous sources of information which might be indicative of potential threats to the nation," a statement by the department said. ...

The researchers, using an grant provided by a research group once affiliated with the Central Intelligence Agency, have complied a database of hundreds of articles that it is being used to train a computer to recognize, rank and interpret statements.

The software would need to be able to distinguish between statements like "this spaghetti is good" and "this spaghetti is not very good -- it's excellent," said Claire T. Cardie, a professor of computer science at Cornell. ...

One article discusses how a rabid fox bit a grazing cow in Romania, hardly a threat to the United States. Another item, an editorial in response to Mr. Bush's use in 2002 of "axis of evil" to describe Iraq, Iran and North Korea, said: "the U.S. is the first nation to have developed nuclear weapons. Moreover, the U.S. is the first and only nation ever to deploy such weapons."

The approach, called natural language processing, has been under development for decades. It is widely used to summarize basic facts in a text or to create abridged versions of articles.

But interpreting and rating expressions of opinion, without making too many errors, has been much more challenging, said Professor Cardie and Janyce M. Wiebe, an associate professor of computer science at the University of Pittsburgh. Their system would include a confidence rating for each "opinion" that it evaluates and would allow an official to refer quickly to the actual text that the computer indicates contains an intense anti-American statement.

Ultimately, the government could in a semiautomated way track a statement by specific individuals abroad or track reports by particular foreign news outlets or journalists, rating comments about American policies or officials.


Post a Comment

<< Home