People Data Lab (PDL), a data
scraping and selling “enrichment” company, boasts that they have data on 1.5 billion people. That data, stored on an Elasticsearch server, was all breached in November. To put this in perspective, the total population of the United States is 330,130,233; PDL says they have three times the data, and it was stored in 4billion records, belonging to at least 1.2billion people, on that compromised Elasticsearch server.
Here’s what People Data Labs says about themselves:
“People Data Labs builds people data. Use our dataset of 1.5 billion unique person profiles to build products, enrich person profiles, power predictive modeling/AI, analysis, and more. We work with technical teams as their engineering focused people data partner.”
And here is what they have and will sell to anyone who pays them, according to them:
- Over 1.5 Billion unique people, including close to 260 million in the US.
- Over 1 billion personal email addresses. Work email for 70%+ decision makers in the US, UK, and Canada.
- Over 420 million linkedin urls
- Over 1 billion facebook urls and ids.
- 400 million+ phone numbers. 200 million+ US-based valid cell phone numbers.
This data was found on an Elasticsearch server. Elasticsearch allows you to “store, search, and analyze” data. The Elasticsearch server, with those 4billion records, and all that data, was completely open and unsecured – no password or other authentication was needed to access all of that data.
HaveIBeenPwned Notification of PDL and Elasticsearch Breach
All of this was discovered by security researcher Vinny Troia, over at Data Viper. According to Troia, “The discovered Elasticsearch server containing all of the information was unprotected and accessible via web browser at http://126.96.36.199:9200. No password or authentication of any kind was needed to access or download all of the data.”
To further complicate things, the Elasticsearch server was set up in the Google cloud.
But wait, there’s more. Because Troia discovered the data of a second “data enrichment” company on that same Elasticsearch server. That company is OxyData.io, who offers “In-Depth Data on People and Companies”, boasting “Finally, an easy way to get business data whenever you need it.”
Now, here is where it gets even more interesting. Both PDL and OxyData deny ownership of the server. That means, as Troia deduces, that either the data was stolen from both companies and stashed on the Elasticsearch server, or a customer of one or both companies misappropriated it, and stored it on the Elasticsearch server, completely unprotected.
So, where does the liability rest for this breach? PDL? They say it isn’t their server, and that they didn’t put their data there. Ditto Oxydata. Is Elasticsearch responsible? Google for hosting the Elasticsearch server?
We’ll probably never know, and it’s likely that, at the end of the day, nobody will be held accountable. And this is why lawmakers have introduced the Federal Online Privacy Act (finally), and why so many states have introduced their own online privacy laws in the meantime.