How your personal and corporate data have probably already been leaked

Wouter van der Houven
5 min readDec 15, 2020

And what you can do about it.

I recently started looking into a problem that has been around for over 10 years and has resulted in countless data-leaks. The problem of so-called “public buckets”. In my research I have made some discoveries that had me at a loss for words. So far, I have found hundreds of thousands of personal, medical, and legal documents. All of which are freely available to download for anyone.

This my first article on this topic in which you will read what these “buckets” are, how they can contain your personal or company data, and what you can do about it. Feel free to contact me with any questions and I will see if I can include them in future articles. Lets get started!

Are the files in this bucket required to be publicly accessible?

This is a question I often ask myself. In my work I regularly come across so-called “buckets”. A term commonly used to describe organizational units for file storage at public cloud providers like Amazon Web Services, Microsoft Azure, and Google Cloud.

Buckets are used to store files, such that they can be downloaded by users and applications. Like most things, public buckets have a genuine purpose. For example, the websites you visit need to store images somewhere, so you can view them. They are accessed by making a web request directly to the file’s storage location.

So public buckets have a genuine purpose. What is the problem?

Unlike the images that you view on a website, there are also things you do not want everyone to be able to access. A great example is the “uploads” folder of a website. Imagine having to upload a copy of your passport when booking a hotel. You do not want this copy to be out there to download for just everyone.

Sadly enough this is not just an example. In my research I have found many of these unintended public website folders. Many of which contained files not intended for the public. Some examples of my findings are:

  • Over 8000 passports of various nationalities
  • Over 1500 birth certificates of various nationalities
  • Over 110000 medical records ranging from patient medical history files to children’s psychological evaluations
  • Corporate investment and strategy reports
  • Personal and corporate laptop, server, and phone back-ups
  • Server access credentials and passwords
  • Countless company invoices

* All of the above has been reported to the responsible parties and authorities.

While the number of personal and confidential files I found had me at a loss for words, I have only started to uncover the tip of the iceberg. I am currently working on refining my approach and algorithms to better identify and assess public buckets. I expect to be sharing some more detailed statistics and breakdowns soon.

Back to the topic of this article, public buckets.

How does this happen? And why does this still happen so often?

By default, buckets are configured to be private. So far I have not seen any benchmarks for data leaks. We do not know why private data ends up in public buckets or why private buckets end up being set to public.

We can imagine some situations like I described in the earlier example. There a mistake was made to include a private folder (uploads) as part of the public folders in the bucket.

I can only speculate on other examples. I could be an engineer that is inexperienced in the cloud technology that accidentally switches the bucket’s configuration to public. Or perhaps a script that contains a “typo” gives the bucket the incorrect public configuration by accident.

So comes down to human error, but what can we do?

We can split this question into two vantage points. That of a company using buckets, and that of an individual sharing private data, anywhere.

As a company, you are in control of the service you deliver and the storage you expose.

Some suggestions are:

  • Keep track of what data you house. Is there private data received, processed, or stored through your applications? Or do you perhaps store documents or backups in the cloud?
  • Implement routine checks to verify that you do not expose any buckets, folders, or files that should not be public.
  • If you have a large infrastructure, conduct periodic reviews (architecture & configuration reviews, penetration tests, threat analysis, etc.) to get insight in what you and your team might have missed.

As an individual, there is not much you can do...

You have no influence and no insight into how organizations handle data. It can not be expected that you do an extensive search for public buckets of a company, before you register or upload something through their website.

What you can do however, is ensure that you know what data you submitted where. For example, if someone requires a copy of your passport, add your own watermark. This way, if there is a data leak, you know who “lost” your files.

But what about the legal aspects?

Due to the nature of the internet and our global economy, our data is not automatically bound to stay in the country we live in. Recent privacy laws in countries around the world have given legal requirements to companies to treat your data with the respect it deserves. However, even if countries implement laws to prohibit data abuse and punish negligence, we are still far from certainty.

A solution could perhaps be sought in assurance reports. By having independent auditors examine companies and services for proper treatment of data and correct storage (bucket) configuration. The findings of this inspection and professional opinion are put in a report. Releasing these reports to the public allows us, the public, to do some sort of due diligence before trusting a company with our data.

In summary, even after more than ten years, accidentally exposed public buckets remain a large issue for our privacy and data confidentiality. Companies can try to implement controls to identify exposed buckets in their business, but there is very little that individuals can do to help keep their files safe. I have proposed a solution by requiring companies to acquire assurance statements from independent auditors, but this of course leaves the question if you trust both the company and the auditor.

Happy to hear your thoughts and ideas!

Thanks for reading.

Wouter

@wvanderhouven

Photo by Pedro da Silva on Unsplash

--

--