Data Has Value, but also Risk – The Basics of Information and ROT Records Management
- By:
- Bill Tolson |
- February 24, 2021 |
- minute read
Corporate Data Overload
Organizations are keeping more data for more extended periods of time. This is due in part to emerging regulatory data retention requirements, lengthening legal cycles, and the fear of spoliation charges due to the accidental deletion of data. However, the most significant factor is that end-users don't have the time to actually read, comprehend, and manage the enormous amounts of data they come into contact with daily, otherwise known as data overload. This ties directly to the ongoing increases in data volumes, data velocity, and data variety – the 3 Vs of information management.
|
|
The 2018 IDC whitepaper, "Data Age 2025" estimated that the corporate data environment will comprise three enterprise locations; the core, including traditional and cloud data centers, the edge that contains external infrastructure, and endpoints which includes PCs, smartphones, and IoT devices. The whitepaper also points out that 49% of corporate data will be stored and managed in public cloud environments by 2025. This statistic is significant because the use of corporate cloud repositories such as OneDrive, Box, and Google Cloud will make it even easier for end-users to save everything to their company cloud subscriptions with little or no centralized management or even awareness the data exists.
These public cloud repositories are similar to employee network share drives because they are rarely indexed, managed (retention/disposition), shared, or cleaned out. For example, over time, an individual Microsoft 365 OneDrive account could contain up to 1 TB of an employee's work-related data. Multiply that by the number of employees in your organization, and you begin to see the problem – corporate data overload.
Information management basics
At its most basic, an information management program comprises both retaining and disposing of data- based on various corporate strategies, policies, and needs. However, many organizations neglect the disposal part of information management. This is because most organizations only manage “records” – specific data and documents that fall under a regulatory retention requirement. And because records only make up approximately 5% of all corporate data, upwards of 95% of corporate data is left unmanaged and therefore not subject to retention/disposition. This lack of information management, in turn, drives up corporate data risk and cost.
Ongoing information management, not just records management, is imperative for an organization's health and ability to compete in the market. Another factor is the continuing reliance on employees to categorize and file data. Because employees do not have the time due to data overload (nor the training to know what needs to be kept and how), most data is never truly managed.
Data has value, sometimes
Not all data has a lasting value, such as old versions of files, aging work files, URLs, old research, file copies, and files that have aged beyond their expiration date. On the other hand, some data does have value for some period of time.
The basic questions around data value are: how much value does a given file have compared to the cost of keeping it, and; for what period of time will the value last? This is not to imply that all data which does not contribute to revenue is valueless and a target for disposal. For example, compliant information management dictates that regulated data must be kept for pre-defined periods in accordance with data retention laws.
Additionally, litigation-related data should also be kept and protected under a litigation hold to avoid eDiscovery issues, spoliation, fines, and loss of the case. As was stated earlier, only approximately 5% of corporate data is subject to regulatory retention requirements, and an additional 1% subject to legal hold. This means that 94% of corporate/employee data is not subject to regulatory or legal retention requirements.
A large percentage of corporate data (approximately 94%), especially data created/stored by individual employees, is not actively managed or indexed for later search. Most of it is never seen, shared, or used by others in the company. The reality is individual employees keep the majority of data they come into contact with because they think they may need to reference it later and because most IT departments do not limit their available storage.
In fact, employees do reuse old data. The McKinsey Global Institute has reported that the average employee spends nearly 20 percent of their workweek looking for old/internal information for reference or reuse. The challenge with searching for old data is it's rarely, if ever, managed/indexed or stored in a common repository. Because of this, employees spend large amounts of time searching in various storage devices/repositories and trying different search terms to see if they can find what they thought they saved months or years ago. If all corporate data, including all employee-controlled data, were centrally stored, indexed, and actively managed, then a single simple search would tell the employee if the data they are looking for still exists.
This fact seems to contradict the concept of defensible disposition, i.e., disposing of data means that employees will spend even more time looking for data that doesn't exist. As I stated in the previous paragraph, if all data is indexed and managed, then a single search will tell the employee if the information they are looking for exists. This highlights the biggest problem with data overload – having so much (unmanaged) data that it's impossible to find specific data in the timeframe needed. A past CEO offered an interesting fact related to information management – "It costs up to 500 times more to find and utilize a specific document once than to store it untouched for 20 years." Data you can't find, even if it still exists, is valueless and a risk.
Data has risk, sometimes
Holding data beyond its regulatory retention period or its usefulness raises risk in legal and regulatory compliance situations.
Many General Counsels (GCs) will talk about their fear of retaining data for long periods because of the risk of an expired "smoking gun" being found in a future eDiscovery search. Also, GCs state that they don't want to delete data because later, they could be accused of deleting potentially relevant data to a yet non-existent case. This strategy is utterly false. Litigation hold - protecting potentially responsive data is only required when the organization is actually served with a notice of legal action or reasonably anticipating future litigation. Otherwise, non-regulated data can be deleted at any time.
In fact, unneeded/valueless/expired files which are pulled into an eDiscovery process can dramatically raise the cost of eDiscovery – see my January 26, 2021 blog to review the Dupont case study on the cost of over-retention during litigation.
Many C-level employees wrongly believe that all corporate data must be retained for a minimum period of time. To reiterate, data not subject to regulatory retention requirements or legal hold can be deleted at any time. There are no laws that stipulate that all data must be retained for a minimum time frame.
Privacy regulations and data deletion
Many new privacy regulations such as the GDPR and CCPA/CPRA and others now specify maximum retention periods for files containing personal information (PI). The GDPR states that PI should only be retained for as long as the original consent was given and used for only the original stated intent. This means that consent is not universal, i.e., data-subject consent is only provided for a specific use. Once that use is complete, all PI collected for the original consent should be defensibly disposed of or anonymized so that personal attributes are unrecoverable.
And notably, a data-subject can withdraw consent and direct their PI to be deleted at any time (the right to be forgotten). The only limiting factor to the right to be forgotten is if the PI is subject to regulatory retention or legal hold.
When is it OK to Delete Data?
Today's legal best practice is to delete records when expired and general data as soon as the data is no longer has value for the company - a 180 from where we were 10 years ago.
Learn More >>>
Data risk versus cost versus value
As was stated earlier, not all data retains its value over long periods. In fact, some data has zero value, ever. For example, emails suggesting going out for lunch lose most of their value to the sender/receiver as soon as the lunch occurs. Email systems are chocked full of transitory conversations that have no value for the company and could be deleted immediately without negatively affecting the business. However, for whatever reason, they are kept and then forgotten about – remaining in the user's mailbox for potentially years. But even the most critical data loses its value over time. That 2017 product forecast spreadsheet, or sales report, or draft of meeting minutes certainly is less important today than when it was first created and shared in 2017.
Chart 1 below shows the average data value loss over time
(data derived from the CGOC Information Governance Maturity Model).
Chart 1: The average value of data over time
Please note: data value is subjective and can vary dramatically from industry to industry. In some industries, data value may show less of a decline over time, for example, in the pharmaceuticals industry (also heavily regulated). Your organization will need to take a hard look at data over time to understand its average value curve. But on average, no matter the industry, data value does decline over time.
In the previous "Data is risk" section of this blog, I indicated holding compliance data beyond its regulatory retention period (or usefulness to the company) raises risk in legal and regulatory compliance situations.
Chart 2 below (also derived from the CGOC report) adds the data risk curve to the data value chart to highlight the apparent disparity in the two trends.
Chart 2: The risk to data value deficit
Chart 2 shows one of two variables to consider when creating a defensible disposition strategy: if the risk of holding (non-regulated/non-litigation related) data is more significant than its value, then that data should be considered for deletion.
The other variable to consider is that of the cost to store and manage the data. Retained data is stored on various enterprise-managed storage resources – hard disk, tape, cloud repositories. Additionally, floor space is taken up, cooling/electricity is consumed, the data must be backed up and replicated for disaster recovery (DR) requirements, additional personnel are needed, etc. These are all costs related to on-premises storage costs. The same holds for cloud storage, even though cloud storage is usually paid via a monthly subscription.
Generally speaking, storing 1 GB of storage becomes less expensive year over year. However, the resources to manage the storage do not. Over time, the fully-loaded cost of storing one GB of data will dip noticeably, but that cost reduction is replaced with the cost of storing more data (remember, we’re also generating more data).
Chart 3 below (also derived from the CGOC report) shows the average cost curve for storage over time overlaid with the value curve with the cost to value deficit.
Chart 3: The cost to value deficit
As in the data risk to value deficit shown in chart 2, a negative value to cost ratio highlights yet another reason to defensibly dispose of data – cost savings.
Overlaying all three charts into chart 4, we can clearly see that keeping (non-regulated and litigation related) data for long periods of time is a risky and costly practice.
Chart 4: Data cost versus data risk versus data value
As a general rule, organizations beginning a defensible disposition process targeting existing data stores can realize an average 40% reduction in redundant, obsolete, or trivial data (ROT), which translates to substantial cost savings and large decreases in the overall risk.
In 2012, CGOC Summit reported that typically 1% of corporate information was on litigation hold, 5% was classified as a record, and 25% had current business value. This means, on average, 69% of the information in most companies has no business, legal, or regulatory value and should therefore be a focus of defensible disposition.
ROT records management: More is no longer better when it comes to information
Managing ROT records with defensible disposition strategies should become a standard process for all organizations. At its most basic, an effective information management program should leverage both management and ongoing disposition initiatives to reduce ROT and data overload, improve operational efficiency, boost employee productivity, and increase corporate profitability.
Digital transformation and data migration
Digital transformation and transition to the cloud is the perfect time to undertake a defensible disposition process. As organziations continue to move away from their on-premises data centers and make the jump to 100% cloud computing, the question all organizations will face is what to do with their existing on-premises data stores – leave in place until the aging data expires or move it to the cloud.
Obviously, leaving it in place while keeping new data in the cloud is expensive and adds to overall complexity and cost – sometimes for many years.
A second strategy is to migrate all existing data to the new cloud platform for ongoing storage – again expensive and risky.
The third strategy is to move existing important data/records to the cloud while culling the ROT during the migration process. This third strategy can be accomplished in either of two ways: manually culling the on-premies ROT before the migration while just moving the important data to the new cloud or; migrating it all and utilizing a cloud information management/archiving platform to cull and delete the ROT programmatically once in the cloud.
The second alternative, cull the data programmatically, makes the most sense in that it can be automated and therefore consistent. For example, once the cloud archiving platform is configured and ready, corporate data retention policies can be created based on date range, owner, department, keywords, etc. Once the data is migrated and retention policies assigned to each migrated file, data not meeting the new retention policies would be automatically tagged for disposal. Optionally some or all of the data tagged for disposal can be reviewed prior to destruction.
This automated defensible disposition process of legacy data during a data migration can be done much faster and consistently while also producing full defensible reports on the process and specific files tagged for disposition.
Archive360 is the world's leader in secure data migration and intelligent information archiving and management. The Archive2AzureTM solution is a complete cloud-based information management and archiving solution for both structured and unstructured data installed in your company's Azure Cloud tenancy for increased security and functionality, ongoing customization, and complete control. Unlike SaaS archiving platforms with limited information management capabilities and one-size-fits-all archiving application and security configuration, the Archive2Azure PaaS solution is implemented in a Zero Trust model, so that you migrate and store your company's data in your Azure tenancy with complete control over granular retention/disposition and security capabilities, including the ability to encrypt data on-premises before movement to your Azure tenancy – while keeping your encryption keys local.Archive2Azure includes intelligent unstructured/structured data migration purpose-built with legally defensible data disposition in mind.
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.