Articles

Web Monitoring Data Archive has been left open to public

Discovered by UpGuard defense researcher Chris Vickery, the databases were entitled “CENTCOM-backup,” “CENTCOM-archive,” and “pacom-archive.”

Based on their titles, it was obvious the databases related to US Central Command (CENTCOM) and US Pacific Command (PACOM), two of the DOD’s combatant command operations.

According to the researcher, the data received from the databases did not include any sensible details. Instead, the databases were constructed by scraping the Internet for publicly available social media posts, forum posts, blogs, news comments, and similar postings.

The scoured data contained the post itself and data to identify the poster. Most of the scoured content Vickery found was written in various languages, but mostly in Arabic, Farsi, and English, and were collected between 2009 and up until August 2017.

Based on the data’s formation inside these databases, they developed to be part of a hybrid Lucene-Elasticsearch search engine.

According to Vickery’s estimate, the databases appeared to have been put together by the US army’s judgment unit in an attempt to mine the Internet for information they force to use for operations.

A folder labeled “Outpost” found on one of the CENTCOM-labeled S3 containers appears to be the work of a former software vendor mentioned VendorX, a former DOD contractor and a maker of big data search engine technology.

After finding the database, Vickery reached the DOD in September, and the databases were settled soon after.

The databases were not publicly available, instead, they required a user to have an Amazon AWS account. A free report would have been enough to access and download the data collected in the three S3 buckets.

Last week, Amazon refreshed the AWS backend panel and added visible signs when S3 servers are exposed online. The band took this resolution after many companies had misconfigured S3 servers and accidentally exposed sensitive data.

Some might examine the Pentagon for collecting social media posts from US citizens as part of “a close surveillance program,” but scraping the Internet is not against the law, and any private companies make a good living off such applications, sometimes selling the information back to governments in need of social media and Internet monitoring.