A few weeks ago, I wrote a post about the information security benefits of semantic layers and I briefly spoke about the supposed (not) Snowflake data breach.
I didn’t know the names of the companies involved and what kind of data was lost at the time. I have since learned that one company was AT&T, one of the largest telecoms companies in the world.
AT&T’s Huguely told TechCrunch that the most recent compromise of customer records were stolen from the cloud data giant Snowflake during a recent spate of data thefts targeting Snowflake’s customers.
This is misleading and disingenuous - they are doing everything they can to suggest it was Snowflake’s fault without explicitly doing so. However, the truth is that AT&T did everything wrong from an information security point of view.
They stored PII in their data warehouse.
They didn’t require two-factor authentication to access the data warehouse.
They didn’t secure their credentials.
They waited three months to tell their customers they lost their data.
Let’s look at no 1:
I worked at a company which had its own in-house CRM system. When I arrived, they were using the DWH as a data store to hold email addresses of users and customers. The CRM system would run DWH queries to extract customer segments and emails to contact based on the segment. As soon as I found out this was happening, I removed the PII from the DWH and made a rule that no PII would be stored in the DWH. They needed to make other arrangements for a data store that was suitable for holding this kind of PII.
I kept the non-PII segment type data by user_id, allowing them to generate a segment of user_ids they wanted to contact, and then acquire the email addresses for those user_ids elsewhere.
This eventually led to a user service being created, which stored all PII about the user and became the only place this data was stored. It only gave access to this data via an API, that enforced RBAC, to only allow services to access necessary PII to function. Where the user was referred to elsewhere in our data model, we only stored the user_id. We could have derivative fields from PII, like the country, operating system and email_domain, of the user in the DWH.
What I knew was that, if we stored PII in our DWH, the security requirements would be so much higher than if we didn’t. I was also part of the GDPR committee at this company, so I was becoming very aware of these security requirements and the consequences of breaking them! You can’t have a customer data breach relating to your DWH, if you don’t store PII in your DWH.
The data that you then do store is, at worst, business sensitive, meaning that if there was unauthorised access, then company performance information could be leaked. For non-listed companies this isn’t great to happen, but probably isn’t disastrous. As most data folks know, it’s hard enough to find the truth about performance when you are an expert in your company’s data, let alone an outsider with little to no knowledge. For listed companies, it’s possible to enable insider trading this way, so business sensitive data also needs to be highly protected, for these companies.
As for no 2 - I mentioned in my last post that Snowflake isn’t at fault for not enforcing two-factor usage. We’re grown adults who are supposed to be experts in our field. We don’t need Snowflake to baby us and tell us what level of security we need for a user or role. AT&T probably have an information security department bigger than most SMB companies, with every security accreditation available. They will have known PII was being stored in their Snowflake and didn’t force two-factor usage, didn’t force removal or tokenisation of the PII data… and the list goes on.
If you’ve worked at a big company before, you know that information security needs to sign off on tech usage and changes, or they don’t happen - for this very reason. This is not some unencrypted CSV on a marketing exec’s laptop, that they left unlocked and unattended at Starbucks when they went to the bathroom. This is a production system. I accept that it is nearly impossible to protect every piece of data a company has stored anywhere, but data in a production data store… that just has to get done.
This then nicely leads to nos 3 and 4.
It is common to have automatic password rotation systems, to prevent credential loss leading to data breach. So, not only did they keep the username and password for their Snowflake in an insecure way, but they didn’t rotate the password or at least not very often.
“The massive data breach exposed records from May 1 to October 31, 2022, as well as some data from January 2, 2023, for a small subset of customers. Compromised information included:
Phone numbers of nearly all AT&T cellular customers and mobile virtual network operators using AT&T's network
Call and text logs, including interacted numbers and interaction counts
Aggregate call durations for daily or monthly periods
Some cell site identification numbers, potentially revealing approximate call/text locations
Importantly, the breach did not include actual call/text content, personal information like Social Security numbers or birthdates, or specific timestamps of communications.AT&T discovered the unauthorized data download in April 2024 and has since taken additional cybersecurity measures to address the incident.”1
Snowflake makes it easy to spin up databases and clone data inside an account. I could imagine, given the limited timeframe of the data, that this particular database could be leftover from some development or migration and not a database that is well-used or looked at by anyone.
It still had the phone numbers of nearly all customers (presumably the only ones not included were inactive accounts during the period). It had interaction logs, counts, durations, some geoip. You could find who was connected to who, rank by importance to a node in this graph, and know where these nodes were geographically. Even if the customer name was not stored against the phone number2, you can, in theory, find out which numbers are important or close to a number you already know. For many people, this is very sensitive information! It allows for all manner of misuse against that individual.
In the three months since discovery of the breach, where the AT&T corporate communications team worked on the least damaging version of the story, over a hundred million customers could be targeted for fraud, stalked, spoofed or analysed for any nefarious purpose.
AT&T has also had assistance from the FBI and the DoJ with the incident. They have probably talked this point up because it may make customers feel less worried, even though the horse has already bolted. The truth is that they will have used millions of taxpayer dollars to help clean up their mess, and an easily avoidable mess.
https://www.perplexity.ai/page/at-ts-massive-data-hack-4BHaIUpARtWEAjXcF5tGbA
AT&T have warned that customer names for each phone number could potentially be inferred from the data using other publicly available tools.