Data is the new oil, and only safe handling of the data can ensure zero risks and avoid the digital equivalent of a toxic spill.
Companies are moving towards the deleting of the personal information of clients that they have been storing to reduce risk. Experts have started referring to data as uranium rather than oil. Data can become a toxic asset that is difficult to get rid of, and if enterprises dispose of data negligently, they can even be sued.
Many enterprises save large quantities of unstructured manual data that might add up the risks. According to statistics, about one-third of the data stored in the datacentres is likely redundant, trivial, or obsolete. There is also a lot of data companies’ store, which is not useful but might come under GDPR. Though the costs of data storage are falling, it still is a cost, especially if it is redundant data.
Experts warn companies that storing data ‘might’ be useful and contribute to the analysis, but there is a high chance that it will be harmful, especially if it succumbs to a breach or a hack. Companies are advised to keep the data only for which they have a reason and analytics to be done.
It’s important to understand what is stored to who is accessing it, and how often. This is the only way to understand the existing data before classifying it based on a bespoke data retention policy. Experts suggest the deletion of unused files quarterly.
The data associated especially with production systems that are no longer in use must be prioritized for deletion. This week Wetherspoon announced deleted data of over 65 thousand customers. The data breach that happened in 2015 with them was from an old website, so it shouldn’t have still been there. Adobe’s password data breach was also from an older, nonproduction system. Experts remind enterprises that they can’t ignore systems that are out-of-date just because they’re a part of the legacy IT infrastructure. Data like that of ex-employee or even ex-customer is at a very high risk since it can contain PII.
Massive mountains of data are neither safe nor useful to get insights or to train AI. Enterprises have to start considering data as a flowing resource, and that should be kept only for current business reasons. Experts have observed that companies are maintaining data in a vague hope that an ML system would discover something useful in it. This particularly applies to PII in data sets that are being considered using to train machine learning models.
Some companies have also moved to ‘de-identifying’ data as there is an assumption, that it keeps it safe. But, it can still ‘identify individuals,’ even if the company didn’t want to.
While data deletion is being looked on as a solution, there is a flaw in the data-centric tech industry that has not figured out how to dispose off of data. The industry has agreed to hash PII, which is considered the equivalent of running a black marker across. However, experts warn that companies should still not collect everything else since a whole bunch of essentially useless information makes it only harder to analyze the usage data, increases the time spent of people that build and test models. It has become a necessity for enterprises to accurately judge the utility and value that information brings as well as tests that data for any predictive value.
The solution that experts provide for data is to have a clear policy for how long will the data be stored. There is a need for establishing ‘forcing functions’ to make the decisions.