On the Risk of Connecting and Collecting it All

The past has shown and the present demonstrates in an impressive way that protecting sensitive assets is hard - if not even impossible. Too many companies fail to protect their customer’s information properly and this situation will continue in the future. Normally, after a breach, the InfoSec community preaches that additional security controls need to be implemented: strong passwords should be picked by users and hacked ones need to be changed, 2-factor-authentication should be used, patches need to be implemented, data needs to be encrypted and many more.

A lot of companies do all these activities - but obviously not good enough and hence still fail to secure their sensitive assets.

In this post I intend to introduce another perspective on this topic which I consider to be worth reviewing. I don’t expect it to be a very popular perspective and there are plenty of considerations that I will not be able to to fully cover in this post. Additionally, after reading the article you will most probably have more questions than beforehand - and this is how it should be. The intention is to stimulate an alternative and unpopular way of addressing this issue that is not broadly covered by media and academic research yet.

The Risk of Collecting it All

With the advent of the big data hype, enterprises got told that collecting and storing huge amounts of data, often including customer data, will bring value to the business sooner or later once a strategy how to process and use the data for a specific business case has been defined. So companies began to collect and store data in their business intelligence systems and their data warehouse environments and are trying to make sense of the data they collect. The price drop of storage hardware is contributing to this movement. So the notion of “the more the better” got adopted by a lot of companies. The latest hype around the Internet of Things (IoT) is heavily contributing to this behavior. Sensors from mobile phones, laptops, sport devices, toys, household devices such as fridges etc. represent a new set of input source to the big data collection. The mindset is: store everything even if it’s not needed today and does not make any sense - perhaps it could be important in the future. Although in most cases the potential for a value-add to the business is not even clear yet.

In Germany, generally known for a comparatively strict data privacy law, the phrase Datensparsamkeit represents a notion to minimize, or even avoid, the collection of data unless it is necessary for a specific business case. Lately however, German Chancellor Angela Merkel told an audience that …

“Whoever sees data as a threat, whoever thinks about every piece of data in terms of what bad can be done with it, will not be able to take advantage of the opportunity of digitization”.

This situation clearly represents a different mindset from a privacy perspective and can be considered as a general “risk acceptance”: Collecting data, the “new oil”, is considered to be more important than taking the associated risks of “storing it all” into consideration.

Another argument for collecting information is to say “… it depends on who is using the data and how”. While this statement is theoretically right, the real world looks differently. There are myriads of cases where data that has been collected for one specific purpose is finally getting used for another one, and this does not even include yet the case of a malicious adversary that compromises systems. However, the huge amount of data breaches throughout the last months and years have demonstrated that even in case only the “good” people are using the data for only a “good” purpose, data will leak and hence be abused. The higher intention of people or entities in general doesn’t matter in case systems get breached and data falls into the wrong hands.

As soon as you digitize information and create data, you lose control over it since you put your trust into a machine to be able to handle and process that data in a secure way. The fact is: you as an ordinary company or enduser do not control this machine totally. You rely on software and hardware that has been built in the context of a supply chain where you have, at best, influence, but no control. So in case you store your data on your local machine, you lose little control - as soon as you upload it to any (cloud) server on the Internet, you basically lose all control over it. This is independent from the purpose and the intention of this action - it can be copied, modified, falsified, published, abused etc. without you knowing and controlling it.

So we basically moved from a mindset of “Collect it All!” even further to a mindset of “Connect it All in order to Collect it All!”. The fact that the more you connect and collect, the more you will struggle to understand it, is often neglected.

Mitigating the Risk

You cannot lose what you don’t own - this is the 180 degrees contrary approach of storing everything. So since you will not be able to protect your assets, companies and individuals should get rid of the information they do not have a direct need for in order to reduce the exposure in case of a breach. The notion of “assume breach” will get more important in the future so I share the opinion that companies should preventively think about what kind of information they need for executing their business processes - and then get rid of the rest while heavily focusing on trying to protect exactly these kind of crown-jewels they cannot delete due to business reasons.

So how could this be done? A good idea is to follow the general GDPR principles that often make sense not only for private data. I tried to generalize the approach below:

  1. Know what information you have: this sounds straight forward but represents a huge challenge for multinational companies - yes, also InfoSec fundamentally starts with asset management as one of the key requirements. You can’t control what you don’t know. But I argue that this also counts for private individuals that store their personal information in various local and cloud-based applications. Those who applied baseline opsec principles will have less issues.
  2. Know where you have it: related to 1), this is a tough challenge for some companies and gets even more complicated with hybrid on-premise + cloud infrastructures. Did you consider all backups? Replications? Have a look at your DR Plan. Are you sure your business is not maintaining any shadow IT? Did they tell you or did you technically review their statements? Inspect - not expect.
  3. Know who should have access: is it only you who should have access? A group of people in the context of a business application in the enterprise environment? When did you execute the last access recertification?
  4. Know how it is protected: encrypted? Key management? Stored offline without direct Internet access or e.g. available via external facing web application? Encrypting data nowadays is easy - the more complicated challenge is how to do the key management properly. Where do keys get stored? How?
  5. Do you still need the information?: ask yourself whether the information in form of data is still valuable to you - if yes, how? What’s the legal / compliance / regulatory perspective on the data? Any data retention policies you need to consider?
  6. Dispose information you no longer need: it will help you to better understand your information landscape in regards to what is important for you as an individual or as a business. It will also make searches for specific information easier and probably safe you some money. Again, in case of a breach you will be happy to realize that you got rid of these 100’000 customer records from the past which you didn’t need anymore.

How can you be sure that I don’t need the data in the future?

I understand that one strong argument against this proposal is: “How can you be sure that I don’t need the data in the future?”.

The short and simple answer is: You can’t - because nobody can predict the future.

The longer answer is: Stored data needs to be constantly evaluated against its value-add to the business. Why does your business intend to keep the data - is it due to a competitive advantage? Is it important data for the daily operational tasks? Does it have value for the brand of your company?

In case you are sure that you need the data in the future, what is the impact for the company if the data gets lost during a breach? Are you sure that storing a specific data set outweighs the risk of losing this data during a breach including the consideration of reputational damage, regulatory fines etc.?

Assume breach! is a question of when and not whether the data gets lost. If it’s customer data, is it worth to take the risk also in regards to new laws like GDPR?

On the Challenge to Delete Data

Unfortunately, deleting data does not mean it is actually deleted - in some circumstances, it is hard to guarantee that data cannot be brought back with whatever methods you intend to apply. Remember: Whenever you digitize information, you lose control over it. This already starts on your own machine and gets worse if you want to get rid of data in the cloud.

Signal, a private messenger, has the capability to let messages disappear - does that mean that these messages get deleted or do they just get flagged as disappear = yes? Or do you explicitly need to delete them? And what does that mean then on a file-level basis?

By deleting data properly, data controllers will get rid of their responsibility. GDPR says, that a data subject may request erasure / deletion of data when there is no compelling reason for it to be retained. I recommend not to wait for the data subject to request the deletion - being proactive could prevent a lot of pain and demonstrates responsible data processing practices by companies.


Nowadays it takes one tiny mistake, one misconfiguration, one unpatched system or application to expose sensitive data to people without a need to know. This game is hard to win - we need to “assume breach” - therefore, we should get rid of data before we can actually lose it in case there is no personal/business value to it. This is not limited to huge multinationals only but also individuals will be more and more threatened by new attacks that will focus more on exposing private data. Why? Because it’s easy to monetize.

So keeping in mind, when talking about Infosec, that you cannot lose what you don’t possess can prevent a lot of damage to multinationals and individuals alike. However, InfoSec professionals need to understand also the strong pressure that comes from the business in regards to new business models and the wish to become part of the “new oil” era. Their job is to help the business making the right decisions and be a trusted adviser when questions about InfoSec come up.

This post is licensed under CC BY 4.0 by the author.