Telephone Records are just the Tip of NSA's IcebergWilliam M. Arkin on National and Homeland Security
http://blog.washingtonpost.com/early...the.html#20161The National Security Agency and other U.S. government organizations
have developed hundreds of software programs and analytic tools to
"harvest" intelligence, and they've created dozens of gigantic
databases designed to discover potential terrorist activity both inside
the United States and overseas.
These cutting edge tools -- some highly classified because of their
functions and capabilities -- continually process hundreds of billions
of what are called "structured" data records, including telephone call
records and e-mail headers contained in information "feeds" that have
been established to flow into the intelligence agencies.
The multi-billion dollar program, which began before 9/11 but has been
accelerated since then. Well over 100 government contractors have
participated, including both small boutique companies whose products
include commercial off-the-shelf software and some of the largest
defense contractors, who have developed specialized software and tools
exclusively for government use.
USA Today provided a small window into this massive intelligence
community program by reporting yesterday that the NSA was collecting
and analyzing millions of telephone call records.
The call records are "structured data," that is, information maintained
in a standardized format that can be easily analyzed by machine
programs without human intervention. They're different from intercepts
of actual communication between people in that they don't contain the
"content" of the communications -- content that the Supreme Court has
ruled is protected under the Fourth Amendment. You can think of call
records as what's outside the envelope, as opposed to what's on the
inside.
Once collected, the call records and other non-content communication
are being churned through a mind boggling network of software and data
mining tools to extract intelligence. And this NSA dominated program of
ingestion, digestion, and distribution of potential intelligence raises
profound questions about the privacy and civil liberties of all
Americans.
Although there is no evidence that the harvesting programs have been
involved in illegal activity or have been abused to reach into the
lives of innocent Americans, their sheer scope, the number of
"transactions" being tracked, raises questions as to whether an
all-seeing domestic surveillance system isn't slowly being established,
one that in just a few years time will be able to reveal the
interactions of any targeted individual in near real time.
In late November 1998, the intelligence community and the Department of
Defense established the Advanced Research and Development Activity in
Information Technology (ARDA), a government consortium charged with
incubating and developing "revolutionary" research and development in
the field of intelligence processing.
The Director of the National Security Agency (NSA) agreed to establish,
as a component of the NSA, an organizational unit to carry out the
functions of ARDA, overseeing the research program of the CIA, DIA,
National Reconnaissance Office, and other defense and civilian
intelligence agencies.
Beginning before 9/11, ARDA established an "information exploitation"
program to fund and focus private research on operationally-relevant
problems of exploiting the increasing torrents of digital data
available to the intelligence community. Even with thousands of
analysts, NSA and other agencies were falling behind in their ability
to handle the volume of incoming material. Existing mainframe machine
aided processes were also falling behind advances in information
processing, particularly as the cost of computing power dramatically
declined in the 1990s.
The information exploitation research program has funded hundreds of
projects to find better ways to "pull" information, "push" information,
and "navigate" and visualize information once assembled.
Pulling information refers to the ability of supported analysts to have
question and answer capabilities. Starting with a known requirement,
an analyst could submit questions to a Q&A system which in turn would
"pull" the relevant information out of multiple data sources and
repositories. NSA is seeking a Q&A system that can operate
autonomously to interpret "pulled" information and provide automatic
responses back to the analysts with little additional human
intervention.
Pushing information refers to the software tools that would "blindly"
and without supervision push intelligence to analysts even if they had
not asked for the information. Research has sought to go beyond
current data mining of "structured" records deeper profiling of massive
unstructured data collections. Under the pushing information research
thrust companies have been involved in efforts to uncover previously
undetected patterns of activity from massive data sets. Software and
tools are also being developed that will provide alerts to analysts
when changes occur in newly arrived, but unanalyzed massive data
collections, such as telephone records.
The effort to navigate and visualize information seeks to develop
analytic tools that will allow agency analysts to take hundreds or even
thousands of small pieces of information and automatically create a
tailored and logical "picture" of that information. Using
visualization tools and techniques, intelligence analysts are
constantly seeking out previously unknown links and connections between
individual pieces of information.
Intelligence community efforts to process "structured" data includes
data-tagged signals intelligence (SIGINT) monitoring of telephone and
radio communications, imagery, human intelligence reporting, and
"open-source" commercial data, including news media reporting.
"Unstructured" data includes news and Internet video and audio and
document exploitation.
I could write volumes about the research efforts and the software
programs and tools used to process the mountains of information the NSA
and other agencies ingest. No doubt over the coming days and weeks,
have been able to identify in government documents relating to data
mining, link analysis, and ingestion, digestion, and distribution of
intelligence. My hope would be that other journalists and researchers
will follow the leads.
The following is a list of some 500 software tools, databases, data
mining and processing efforts contracted for, under development or in
use at the NSA and other intelligence agencies today: