Description:
In our latest episode, Bill Tolson and special guest Jason Stearns, Past President of ARMA, discuss how to manage the growing amount of information flowing into and out of enterprises today – both paper and electronic. The episode also explores the issues resulting from a lack of corporate focus on information management within an organization - including defensible disposition.
Blog
Data Has Value, but also Risk – Get Rid of What You No Longer Need
In the past, organizations had a "keep everything" mentality when it came to their generated data. But today, that data is putting the organization at risk. Read the accompanying article to this podcast to learn when it's ok to delete data.
Speakers
Jason Stearns
Immediate Past President
ARMA International Board of Directors
Jason C. Stearns is Immediate Past President of the ARMA International Board of Directors. He has served on the board since 2017. Previously, Jason served as chair of the IGP board of directors and was president of the ARMA Metro NYC Chapter.
Jason also worked as the Information Governance Program Manager in the Legal & Compliance department at Citadel. He has held similar roles at BlackRock, UBS AG, and New York Life Insurance Company
Mr. Stearns earned an MS in Applied Information Management from the University of Oregon and BA from Binghamton University. He is a certified IGP. Jason is currently studying for a Master’s in Legal Studies, Cybersecurity and Information Privacy Compliance from Drexel University's Thomas R. Kline School of Law.
Bill Tolson
VP of Global Compliance & eDiscovery
Archive360
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.
Transcript:
Bill Tolson:
Welcome to Information Management 360 Podcast from Archive 360. This week's episode is titled, Data Has Value, But Also Risk; Delete what you no longer need. My name is Bill Tolson, and I'm the Vice-President of Compliance and E-Discovery at Archive 360. With me today is Jason Stearns, the immediate past President of ARMA. Thanks for joining me today, Jason, and really looking forward to our discussion today.
Bill Tolson:
Today, Jason and I are going to be discussing the issues associated with keeping too much data in your corporate enterprise, both hard copy, and electronic, and how to address this challenge, otherwise known as defensible disposition. By the way, also known as defensible deletion, and defensible disposal. So I've run across all three terms, Jason. It's depending on the company and so forth, but it all basically means the same thing, and we'll get much deeper into it.
Bill Tolson:
So with that, let me kind of set the stage here for everybody, and then we'll kick it off with Jason. Organizations continue to deal with the growing tidal wave of information flowing into and out of their enterprises, every day. Industry analysts refer to this as the three Vs; the Velocity or the speed at which data is received and created, the Volume or sheer amount of data sent and received by employees daily, and then the Variety or types of data the organization must deal with on a daily basis.
Bill Tolson:
This increasing wave of data is complicated by regulatory compliance requirements, litigation hold and E-discovery response, and the need to retain data, valuable to the business. Many organizations still only focus on managing corporate records for regulatory compliance requirements, which amounts to in my mind and data, I've read, amounts to anywhere between 5% and 10% of all corporate data.
Bill Tolson:
And those same companies tend to ignore the 95% of data, that's, non-regulated, leaving it to the employee in many cases to manage. Now, most end-users will acknowledge their information management habits leave something to be desired, including myself. They're too busy doing their daily work to spend large amounts of time on data management, so they, based on human nature, tend to ignore it. Now that doesn't mean it's not important. In all the companies I've been in over the years, employees are rarely basically taught or educated on, on the need for information management. There's some education that goes on around records management, but it's all of this data, including the non records, that we're going to be talking about today.
Bill Tolson:
In fact, in records management consulting, we referred to something some of you may have heard of before, it's called The Five Second Rule. If it will take the average employee more than five seconds to classify and store a document in the proper repository, then they will either classify it as 'keep forever' or delete it immediately, because it's the easiest thing to do, and they got other stuff to do. They got loads of other work.
Bill Tolson:
This lack of corporate focus on managing all data within an organization, causes several issues, which Jason and I will be discussing today, around the topic of defensible disposition. So with that opening, Jason, let's dive into the discussion. In your mind, and you've been involved with records and defensible disposition and information management for a long, long, long time, what's the downside of companies retaining large amounts of data that potentially has no value?
Jason Stearns:
Well, I mean, for one, if nothing else, it's expensive. As the volumes of data increase, you need more and more storage for it. And there's the whole thing about all storage is too [inaudible 00:04:33], we can get into it later, but I would say it's expensive. It's also risky. There's a lot of this focus now on information security and data protection, and what do you do with that?
Jason Stearns:
Well, the more data you have to protect, the more complicated, more expensive that gets. So less is more in this case when it comes to that. And then there's discovery; whether you've got a small litigation profile, a big litigation profile, you're heavily regulated, moderately regulated, at some point you are going to be asked to go through old data. Not because it has value, but just because it's there. And that's when getting that data gets really expensive; getting to it, getting it processed, getting it reviewed by counsel, getting it packaged up and sent over to, whether it's a regulator or opposing counsel, whoever it is.
Jason Stearns:
All of those steps in that process add up super, super fast to the point that that super inexpensive data, starts costing thousands of dollars potentially per gigabyte. So it's a risk and cost, really, more than anything else.
Bill Tolson:
Yeah, that's a great point. And you mentioned the cost of storage and I'll kind of highlight that in a minute, but also, you're talking about, and you also mentioned this, the thought of the more data that is there, there's more risk associated with it because there can be PII in that data, all kinds of other things.
Jason Stearns:
PII, trade secrets from other organizations, other proprietary information. People keep things on their network stores that they probably shouldn't or most certainly shouldn't. And yet, even though that data shouldn't have been there, if it gets exposed in a data breach or a hack, guess what? You're still responsible for it as the owner of the system on the platform.
Bill Tolson:
Well, and you mentioned the discovery as well. And the thing I keep telling companies when I talk to them, and have for years is, if data exists, it can be discoverable. And like you mentioned, it can be requested in an E-discovery request. So in discovery, there's nothing necessarily out of bounds, unless it's privileged. And you have to have a good reason for that, obviously.
Jason Stearns:
Over the years, this concept of reasonably accessible has gained some traction. And if you can demonstrate that those backup tapes from the late '90s, really have no value, and the data that's really in scope of the investigation or litigation, can be pulled from other systems. You can often make an argument that, that data isn't worth searching. You won't be forced to search it.
Jason Stearns:
You may be forced to demonstrate that that data has no value. And again, just the cost of doing that random sampling to say, "Hey, there's really no there, there," even that gets expensive. Particularly if you're talking older formats or proprietary formats that you no longer have the ability to support directly. So now you're dependent on an outside consultant or other third party.
Bill Tolson:
Well, and that brings up the idea that if you no longer have the ability to access those formats, why do you have it? And I've seen that become a problem before, but one of the tests, when I was consulting, one of the tests that we would do you would have a GC at a company basically say, "Well, based on the 2006 FRCP or the 2015 revisions, my backup tapes are inaccessible."
Bill Tolson:
And the first question I would ask their IT guy, who was usually sitting next to them and say, "Have you ever restored a backup tape because your CEO deleted a file that he needed?" And as soon as they say, "Well, of course," then they're not, and it's accessible, and they're open... obviously up to the judge to make that decision. But by doing that, you're proving that they're not just for DR. You're almost using it as an archive.
Bill Tolson:
So that's one of those things that we talk about, but also Jason, you talked about the myth that enterprise storage is actually cheap. And I have a feeling, I have an idea of what your position is on that, but what do you think about that statement?
Jason Stearns:
Well, so it's exactly that, it's a myth. I mean, there's two parts of that, which make that false. Yes, the cost of the raw storage itself has absolutely gone down dramatically over the last 10, 20 years. Even just within the last few years, as cloud has become more accessible and more affordable. The problem is the volume at which that data is growing, in most organizations, outstrips the reductions in costs. So you're just throwing more and more data out there, and even though the cost is going down, you're still at best, the same year over year. And at most organizations, that number continues to increase.
Jason Stearns:
Oh, and don't forget, you've got to support that data. You've got support that storage. So if it's all hen house, you've got the whole technology team, you got to pay someone to watch that little green light blink and make sure it's blinking at the right speed, and that everyone's got the right access. You've got all the software you need to support to be able to continue to access that data. You've got all the costs of migrating it from server A to server B, when something goes end of life, or when the storage capacity on that server no longer meets your needs, and now you need something bigger. All of that adds up.
Jason Stearns:
Well, then it goes, "Oh, well, cloud. Can we go cloud? Maybe it's cheaper." But again, the volumes of data where most organizations are throwing up into the cloud are outpacing the cost. And then once that investigation or litigation comes into scope, while cloud's really good at putting data up into it, it's still struggles in a lot of cases of getting it down in a reasonably efficient manner. So then you're adding costs there just to, in terms of delays, in terms of alternate methods of getting the data. And then what happens if it delays your ability to respond in a timely fashion?
Jason Stearns:
I worked most of my career in the financial services industry, and there are very specific, not so much requirements, but expectations about how quickly you can turn around a request. If you're handcuffed by a cloud storage solution that only lets you download a certain amount of volume of data per hour per day, and that doesn't meet what your production need actually is, that could get you a little trouble. So yeah, all these things add up so that the cost of data being next to nothing or cheap, is absolutely [inaudible 00:10:40].
Bill Tolson:
Yeah. I've had that conversation with many companies. In fact, I've had more than once, I've had somebody respond to me that, "Well, gee, you know, I could get two terabytes of spinning disk at Best Buy for $83." I was like, "Well, good for you, but you wouldn't put your mission critical enterprise on that stuff now, would you?" You get into the concept that a lot of people don't really understand, and that's the idea of fully loaded storage, the cost per gigabyte. And that's what you just went through; the cost of backing it up every night, the cost of having up-to-date DR capabilities. The cost of the employees around it, the cost of securing it, all of that stuff.
Bill Tolson:
I mean, it used to be not too long ago, Gartner and the other market analyst firms, would basically estimate the cost of enterprise storage was around anywhere from $10 to $18 per gigabyte. That sounds ridiculously expensive. And that's over its life, usually a three-year life, but still, if like you mentioned, you're looking at cloud, you could be looking at 10 cents per gigabyte per month, or if you're using archival storage, half a cent per gigabyte per month, but then you have to make sure that, like you also mentioned, that you have the access that you require.
Bill Tolson:
And some, especially status providers in the cloud, will convert your data or they won't index at fully, or they'll throttle your ability to move it out again. So knowing that kind of stuff ahead of time, when you're signing contracts, is also important. But yeah, absolutely, this idea that, "Gee, I can get four terabytes for $129," it's hard not to laugh at that. Usually, that'll come from an IT guy, usually not the storage administrator.
Jason Stearns:
Well, I mean, I can buy that storage cheap and I have. I basically have my entire music collection on it, but I also paid to back it up. I also paid to have the software and the computer that grants me the access to it. If you just think about it in your own personal day to day, it adds up even just in how you... if you manage it effectively at home; I live, eat, and breathe this stuff, so I'm probably a bit more thorough in protecting my personal data than the average John or Jane Q Public, but it adds up quickly.
Bill Tolson:
Oh yeah. The other thing that companies need to take in consideration about saving too much, is the idea of what's the cost of finding something when you need to find it? I used to have a CEO at Mimosa Systems and then Iron Mountain, and he would say that it would cost anywhere from 20 to 100 times more to find a specific file than to store it for 20 years. Because if you can't find it, then you're wasting time. Your productivity is going down the drain. And that stuff adds up very quickly.
Jason Stearns:
Yeah. There is a study out of UC Berkeley. I want to say five or six years ago now, maybe a little longer, but they did a study that the average knowledge worker, loses the equivalent of eight hours a week, looking for data they often never find. And what's so challenging about it is it's not like you sit down and waste an entire day looking for a file. It's in little drips, and draps, and drops, and bits, and pieces throughout the course of your workday that just add up. And then you either wind up giving up and starting over again, or you rely on data that's outdated, because, "Well, it's the nearest thing to the final I could find."
Bill Tolson:
That's exactly right. And you're one of the first people that I've talked to that brought that up. And I've written about this over the years, but you said eight hours per employee per week, potentially; years ago, five, 10 years ago, I created some financial models around that specific question. And I used anywhere from two to four hours per week, of an employee trying to find data that they're pretty sure they saw a year or two ago, :I need to get some forecast numbers out of it," so they spend 2, 4, 8 hours over a period of a week trying to find it.
Bill Tolson:
And then if they can't find it, they spend time recreating it. And how many hours does that take? And then you take that into account of, what's a fully loaded cost of an employee in that company, then what's the opportunity cost for that time to be spent generating additional revenue? I did a financial model around that for a very large automotive company, and they were wasting billions of dollars based on the size of their company, the amount of stuff they were trying to find over a period of a year, just because they couldn't find the data they needed.
Bill Tolson:
And then, Jason, I don't know if you've run across this before, but there's an old DuPont case study, and it's really centered on E-discovery. And it relates both to trying to find information, but also expired data.
Jason Stearns:
Is that the one where like nearly half of the data they entered in discovery could have been destroyed, had they followed their own retention schedule?
Bill Tolson:
Exactly. They looked at nine cases over [inaudible 00:15:48], over a period of a year. They spent $24 million doing E-discovery review of all these millions upon millions of documents, and after the fact, they studied, and they said half of those documents were expired and should not have existed. Therefore, we spent an additional $12 million reviewing data that should not have existed.
Jason Stearns:
I mean, anywhere I've tried to implement some type of data cleanup, defensible relief, whatever you want to call it, I've always run into a variation of this story of, "Well, there was that time that Susan saved the day because she found this document that if we hadn't had it, we would all lost this deal or lost this client or been subject to this sanction." What gets left out of that story all the time is, well, that document that Susan found, was something you needed to keep anyways. And it took her two weeks to find it.
Jason Stearns:
Had your stuff actually been organized, and you had cleared up the garbage that she then also had to go through, she would've probably found it in a day or two tops, if not even just a few hours, and look at all the wasted time and all the wasted effort. So when I introduce these days data cleanup efforts, I'm saying, "I want to get rid of that stuff so that Susan can find that stuff that she had to keep anyways more efficiently." And what happens when Susan wins the lottery, and retires to Tahiti?
Bill Tolson:
Yeah, absolutely. I mean, I won't tell it now, but I have a story related to me on that and I have nothing to do with Tahiti, but it's an interesting one. I'll talk about it later. So we're talking about, and we've opened this podcast where the idea of defensible disposition or defensible deletion, what in your understanding, Jason, can you define what overall would defensible disposition is?
Jason Stearns:
I'm sure there's some textbook definition out there somewhere, but here's how I describe it. It is a systematic process whereby you review data in whatever form or content that it's in and develop a framework for making a decision about, to keep it, to destroy it, to analyze it further and then destroy it or analyze it further and then keep it.
Jason Stearns:
It's not this haphazard, “Oh, just shove it in the shredder,” approach. It's systematic. You have to document your decision process. Who was involved. What were the factors that you considered? What were the things that changed along the way? Maybe there was a big legal hold that was preventing you from looking at a large set of data. So the decision at that point was, we're just going to ignore it, but then two, three years into the effort, now that legal hold is lifted. So now that data is all in scope.
Jason Stearns:
Do you need to change your approach? It's documenting all of that, having that systematic way so that not if, but when, you find out that you inadvertently deleted some data that maybe you should have kept for one reason or another, you can demonstrate that you acted in an informed manner, you acted in good faith and Mr. or Mrs. Regulator, or your honor, this was just clean up through the normal course of business. I was not acting in a bad way here. That's really the whole process.
Bill Tolson:
Exactly. Perfect description. And at the very end you say, referring to the court, you have to be thinking about what if during discovery, the opposing counsel says, “Well, did you get rid of data, and why?” and, “Gee, judge, I have a feeling these guys were basically destroying data that they didn't want us to see.”
Bill Tolson:
Well, if it's part of a standard process that you document, and at the time the data you deleted, there was no reason to anticipate a legal hold on it, then why not? That's the one thing that I've been told by lawyers for 30 years, but also I tell clients is, there's no hard and fast rule beyond the regulatory retention requirements for specific industries, but there's no legal requirement to keep data. Companies keep it for all kinds of reasons.
Jason Stearns:
And there are explicit court decisions that say exactly the opposite, even going back to the example, everyone holds up as the bad actor, Enra where the court has come back and said, “As long as the organization is following a systematic process and its retention schedule, they can and should be free to delete data that is not subject to some type of preservation order or regulatory requirement.” I am paraphrasing, but it's something to that effect. And there have been multiple decisions since then, whether it's specific to email, because that's always a sticking point for organizations or whether it's just their data and records generally. The key is that it's systematic and it shows that you're making reasonable decisions.
Bill Tolson:
Exactly. And you have a documented process that you are regularly following. I remember, not too long ago, it was probably five, six years ago, the Apple versus Samsung case, where under discovery, it was discovered that Samsung had a standard retention policy for email of two weeks. And after two weeks, it disappeared. And apple obviously thought that that was a little unusual, but we've said there is no hard and fast law that says you must keep data for one year or so.
Bill Tolson:
So as long as it's standard process and it's documented and you follow it and you can apply legal holds when you need to, that's up to the company to do that. It's really interesting the kinds of myths that people have around data. And it used to be, I don't know, Jason, if this is your experience, but 10, 15 years ago, a GC's, general counsel's basically way of handling data was to get rid of it as soon as possible, because it may contain smoking guns, so get it the heck out of here.
Bill Tolson:
And then we've kind of gone the other way, at least in my experience now, that most general counsels and companies in general are afraid to delete anything because they might be accused of spoliation. So we keep everything. And we end up with these massive data collections that nobody can find anything in, but you're spending all of this additional overhead on, to keep it.
Jason Stearns:
That's absolutely been true. I mean, I know in one of my former roles, my chief legal officer was like, “Well, I can't authorize the deletion of anything unless we can say that it's absolutely perfect every time.” Listen, if you want to make sure that we don't ever delete anything we shouldn't delete, then we shouldn't delete anything and just hold on to everything. Well then can you start to turn and red, because he knows that that's not going to work either, just because of some of the issues we were dealing with at that particular organization.
Jason Stearns:
I think the default position at most of the firms that I've worked at, my colleagues and my peers, where they work and just talking around through peers and colleagues in the industry, has mostly been, “Keep it just in case. Keep it forever. Well, keep it for whatever's required of us and maybe a little bit longer and a little bit longer.” That later date never comes.
Jason Stearns:
I think the changing rules around privacy in the United States, certainly GDPR has that effect, but in the US, US-based companies, you know it's all about the US, no matter what their global footprint is, I think the changing privacy landscape in the United States and the increased ransomware attacks and malware attacks and data breach just generally, I'm seeing at least more and more organizations are starting to rethink that. That's a good thing. That's a good place for us to be.
Bill Tolson:
Yeah. I absolutely agree. What's your feeling of how, say companies in north America, have they taken the idea of defensible deletion to heart yet, or is it still kind of pushing rope uphill type of thing?
Jason Stearns:
Like everything, I think it's based a bit on what a market sector or what industry you're in. I would say I've spent most of my career, like I said, in financial services. It's a mixed bag. Certainly I've seen the organizations I've worked [inaudible 00:23:30] attraction around pretty much everything, but email. Email and electronic communications generally is still a sensitive spot for a lot of organizations. I think in my former world in still financial services though, insurance, there's been a shift. I think prior to ‘Big Data’ and ‘AI’, and you can't see me, but I'm putting air quotes around both of those terms.
Jason Stearns:
Until those popped around, I think they were doing a pretty good job of clearing out the data. But now I think suddenly they're realizing, and they writ large, not anyone particularly on my resume. Disclaimer here. I think as an industry, they're starting to realize that there's potentially value and all this information they've collected over the years, whether it's for doing their underwriting or just their client profiles or whatever, and now they're trying to monetize that in some way, and I think they're starting to bump up against privacy regulations and are having to rethink those strategies.
Jason Stearns:
So it's a mixed bag. I have colleagues in other industries, entertainment, you have the content for them that is valuable. I mean, you would never want to apply a retention schedule to, I don't know, pick your favorite television show. Two of mine from a bygone era are, I Love Lucy and The Twilight Zone. You wouldn't want those to go away, right?
Jason Stearns:
But there's all that stuff that an organization produces in support of, not only that particular content or product, but just keeping the lights on and doing the day to day and keeping people paid and paying the rent and paying the electric bill and all those things that once those bills are paid and you've met whatever obligation you might have from an auditing or tax perspective, those things can and should go away.
Jason Stearns:
And I think that's what organizations don't fully understand yet, is we can divide your world into and say, “Well, this is the stuff that's really heavily regulated or controlled or has real long-term value to the business.” And we can refine that and define that later, but I've got so much crap, and that's a technical term, that we can get rid of that will never impact that world, and I can reduce your risk and I can reduce your data storage and I can increase efficiency, if we actually take a systematic approach to handling it.
Bill Tolson:
Yeah. That brings up a couple of related topics that I've, for years have been writing about and so forth. And they're not jaw-dropping anything like that, but the first thing is, to manage data, you have to know what you have. It's pretty common sensical. And to defensively delete data, you have to know that you have it. And one of my biggest questions, I wouldn't say it's an issue, but I've consulted with companies that have tried to address this, but generally speaking, and you can argue about the numbers, but generally speaking, 75 to 80% of all data within a company is controlled by individual employees on their workstations or laptops.
Bill Tolson:
And the centralized governing authority, the company has no idea what they have. They're not syncing it, they're not doing anything with it. So in that huge amount of data that only individual employees know they have, there could be PII, there could be IP know-how, there could be aging stuff that you really shouldn't have, it's taking up space, some employees will be backing it up to their shared drives all, all kinds of neat stuff, but what's your opinion on, can in the future, can organizations and should organizations actually try to somehow control or manage that data?
Jason Stearns:
Yes and yes. First of all, what organizations often forget until I have come in and rewrote the policy is they actually own the data that is on their servers. And they also own the liability for the data that's on their servers. So again, using the example of the data breach, if an employee rightly or wrongly has been doing their taxes at work every year and storing the files onto their personal drive, which these days, most of these personal drives are virtual.
Jason Stearns:
So it's not actually sitting on a box under their computer, under their desk. It's on the company's network. Or if it is on a box under a desk, it's probably being backed up somewhere to the corporate network. But the company still owns and is responsible for that machine, so they still own the data. Well, if anything on that gets breached, the company's still responsible.
Bill Tolson:
Yes.
Jason Stearns:
So yeah, they should be managing what's on there. They can manage what's on there under the law. And even in jurisdictions where there's much tougher rules around privacy or even rules around what a company can or cannot do in terms of managing an employee's personal content, a lot of it just comes down to, and not exclusively, I'm not a lawyer, I don't play one on television, but having done this for years, a lot of it comes down to making sure the employee is properly informed that unless X, Y, or Z is done with the data, it will be deleted and will be deleted pursuant to this set of rules or policies. And organizations need to implement that. I've had success implementing those types of approaches, whether it's at the personal drive level or at departmental network drive level, pretty much anywhere that I've worked, we've had some level of success with that.
Jason Stearns:
A lot of it does come down to corporate culture. I mean, there are organizations where people spend their entire careers there, they meet their spouse there, their child ends up working at the same firm, and you've got all the personal data to demonstrate that whole history sitting on their C drive.
Jason Stearns:
So you don't want that situation. So you have to manage against that. Or an employee who's trying to do the right thing, but inadvertently brings in proprietary data from a former employer. “Oh, I worked on an issue very similar to that when I was working in my former job. Let me use that as a jumping off point,” and they mail it to themselves. Well, now you've got information that might, from the employee's perspective seem perfectly innocent and innocuous, but yet it's governed by some trade secret rules somewhere, or it breaches some confidentiality arrangement you've had. And now, again, as the corporation, you are responsible for that.
Bill Tolson:
Well, and you mentioned corporate culture. That's one of the things that I've tracked and recognized for 20 years. And I've actually brought this up with companies, especially high-tech companies. Their culture is, “Well, that's stuff we don't care about. That's the employees business. We're not going to try to manage that. We only care about records.” And it's like, “Well, wait a minute. With the new PII laws, the privacy laws, with E-discovery, with all of this others kind of stuff, that is the corporation's responsibility, whether you don't agree with that or not.
Bill Tolson:
But the biggest issue, like you mentioned that I've had, is the overall corporate culture of “We're not going to bother with that.” And they're not going to bother with it until it comes back to bite them. And I've seen that happen. In the hundreds or thousands of companies I've worked with as a subject matter expert over the years, of all of those I've run across two, that actually looked at that data and one was a bank and one was another industrial company.
Bill Tolson:
And the industrial company basically set up syncing software so that any time a laptop or workstation was synced to the system, it would go into select folders and it would copy data to it centrally. So they had to index it, know it was there, respond to E-discovery, those kinds of things. It all goes to a central repository, unless you're on a laptop that you're traveling with, and then the next time you sync, everything gets moved over.” And [inaudible 00:31:11] that is wildly out of the ordinary, I think it's a good practice.
Jason Stearns:
I would add to that what it should also do, when it's syncing and indexing and everything also analyze the data for whether or not it should be considered stale, based on a definition the firm defines. And if stale, tag it for potential deletion, and then again, depending on culture, either leave that up to the end user to say, “No, no, no. I still have value in that. I want to keep it,” or, “Yeah, no. Yeah. I haven't looked at that in X.” It should go away because it's no longer the official copy or it's just something that's taking up space.
Jason Stearns:
I think a lot of employees, when I've talked to them in my various roles, they want to do the right thing, but as you mentioned at the outset of our talk today, they don't have the time, this isn't their job, they have just barely a working understanding of what a record is, and that's really only if you're in one of those more regulated industries. So for them to make a decision as to what should or should not go from anything else's perspective, they don't have the knowledge nor the time.
Bill Tolson:
Yeah. And you mentioned the idea that all of that data sitting on individual computers could have PII in it that with the new privacy laws, you have to know about as a company. So part of that syncing with the central repositories could be, “I know these files on Bill's computer has potential PII or other stuff on it. I'm going to data mask that stuff. I'm going to anonymize it. I'm going to pseudo anonymize it, so that if bill doesn't have the authorization to actually be looking at that, he's not going to be able to view it.”
Bill Tolson:
Those systems now exist. It can automatically go out and determine using [regex 00:32:52] and other kinds of things where PII is, and then based off of entitlements and authorizations, make it so that certain people can't see that data, even if they are within the company and they're holding the fine.
Jason Stearns:
Yeah. And that one always gets to me as the HR data and when it comes to PII, because I've talked to a lot of organizations, I have done a lot of work around securing their HR data and it's access only and properly anonymized, otherwise. Minimize the amount of data they keep in systems that are more readily acc… all that type of stuff. And then you say, “Well, okay, so when your summer interns come in, how do you share the resumes?”
Jason Stearns:
And inevitably it's email. Sometimes it's attached to a meeting invite, and depending on how you've got your calendar configured, that's marginally better, but nothing has prevented that hiring manager from printing out a copy or saving a copy locally. So a lot of times, and I think this is again where privacy law changes are going to have a real impact on, not just what data is stored, but even in how we're sharing it and accessing it.
Jason Stearns:
We've got to move to an environment where that resume and those employment candidate details are in a read only format that not downloadable, not edible, not printable and people are just going to have to get used to that change. It's going to be hard, but it's just where we're headed.
Bill Tolson:
Well, I absolutely I agree and I think one of the subjects we're kind of talking about without saying it, is the idea of data consolidation. Should companies be looking more to consolidate all of their data so that they can manage it, they can find stuff, they can determine for discovery or whatever, where the data is? And that's one of the things I've been talking about for a couple of years now.
Bill Tolson:
We're talking about end-user data; consolidating that data centrally, or at least having copies of it so that you know what's there, is a huge cost savings for the company, but also a risk mitigation factor as well.
Jason Stearns:
And also, just efficiency. There's a startup that I've been talking with and offering them some complimentary consulting for lack of a better term. And I was on the phone with one of their heads of legal, and I said, “Great, based on the nature of your business, based on what you're doing, having these big massive data sets so you can do profiling of your customer base, and run algorithms to do this, that, and the other thing, all well and good. I don't want to prevent you from doing that, but can we maybe only have one copy of all that essential data, and not the 14 that is spread across the cloud in the different teams.”
Jason Stearns:
"Let them go get and recopy the data they need when they're running a new project, but have the policy and procedure in place, so that once they're done, okay, the algorithm they're running to identify new marketing opportunities hits a dead end, and you decide to scrap it, go a different direction; that happens all the time. Well, that data needs to go away. And then if they needed again, they can keep the algorithm fine, but the data that ran through it, they need to get rid of."
Jason Stearns:
"And if they have a new algorithm and they want to try it again, well, then you get it again. Or you have a sandbox that keeps that more controlled." And this attorney was like, “I've never heard of that, that would make much sense for what we do.” Like, “You're welcome, my fee is hire me.”
Bill Tolson:
Yeah, I know, great point. Again, we come back to the basic commonsensical thing, you have to know what data you have, to manage it. And if you don't care about managing it, then we have another discussion. But a lot of that data is in areas that currently, a centralized management structure within the IT system, doesn't have access to. But it brings up another point, Jason, I know you've written about this before and I have too, and that's the idea, we hear people talk about, including you and me, this idea of dark data. Can you explain what dark data is?
Jason Stearns:
So dark data are these data sets of variety of formats or whatever, where you have actually little or no insight as to what's there. Perhaps it's a proprietary format from a system you no longer use, and it's stuck on a backup tape, or it's a data blob from a database extract for a system you don't have anymore. Yeah, it's on a readily accessible standard database format, but you don't really know anything more about it than maybe which department it came from.
Jason Stearns:
You don't know who used it, you don't know what data they were using, and so unless you are going to spend some time analyzing that data, it's a gigantic pile of costs and potentially risk, and what do you do to clean it up? Well, that depends on a variety of issues, but yeah, it's that data where you've got little to no insight as to what's actually there. Maybe other than [inaudible 00:37:29].
Bill Tolson:
Well, and because you don't know what's there, you can't really determine if it's valueless or not, because you don't know what's there.
Jason Stearns:
Well, I would argue there are ways you can determine that. I mean, if it's been sitting on a backup tape, that's just been collecting dust at your storage provider for the last 10 years, it's never been pulled back to be accessed in any way, it's on a proprietary format and it supports a part of the business that doesn't have a retention period greater than six years. I would say you're probably safe in getting rid of that. But again, you need to document the hell out of that.
Jason Stearns:
Now, if it's a situation where you think there might be some value in it, maybe it supports a line of business that's gone defunct, but maybe the business is thinking about getting back into that line of business, okay, well, there are things you can do to assess that value. But again, that's a determination you need to make. There are steps that you need to go through and guess what, you need to document the hell out of that too.
Bill Tolson:
Probably seven, eight years ago, I was working with a company on discovery, and the opposing counsel had in the early meeting confirmed everything. They determined that this company had a big kind of filing cabinet in the warehouse, full of eight-inch floppy disks. And they were asking to know what data was on the floppy disks, and the company I was working with basically said, “We have no idea.”
Bill Tolson:
And the question was, “Well then why do you still have them?”. And they had discussions with the judge, and the judge, he didn't make a ruling, but he said, “I'm leaning toward, you have to restore that stuff, so at least we know what's there.” Try finding an eight-inch floppy disk drive these days, outside of a museum.
Jason Stearns:
Let alone a working copy of Windows. 3.1.
Bill Tolson:
Yeah. Oh man.
Jason Stearns:
With the breaking glass symbol when you shut it down. Oh gosh, I'm showing how old I actually am.
Bill Tolson:
Yeah. So getting back to the defensible disposition, and I know you've done presentations on this before, in kind of a shortened version, what's the best way to start of a defensible disposition project for a company you're working with?
Jason Stearns:
Pick something, honestly, just pick something. There are so many opportunities, I think you have to understand, what is the culture at the firm like, what is their risk tolerance, what's their litigation profile? But if you are working in an organization that's been around for more than 10 years, you're going to have lots of opportunity to look at stuff to clean up. Because either there was never a record in the first place, or it has exceeded any ongoing retention obligation you would have, and it's just collecting dust.
Jason Stearns:
Whether it's going after backup tapes, because someone had the bright idea that you had to keep the backup tape for the same length of time as the record, and so now you've got backup tapes to go back a decade. You reconfigure how you create backup tapes so that it supports operational recovery. My computer blows up, I don't want to go back to 2012, I want to go back to yesterday. So anything outside of that, you need to figure out a place for it to go away. So now you create the new policy and you go forward.
Jason Stearns:
You might do the same thing with email. You might just look at your... I mean, this is a painful one, but it's in a way, an easy one, your physical inventory at your offsite storage provider. I will guarantee, if you are an organization of any age or size, you have a substantial inventory inboxes that have little to no information on them, and you can, without a whole lot of effort, determine that they exceed any retention obligation you might have. And you set up your decision tree, and your matrix, you get it signed off by all the people who need to sign off, whether it's the head of the business, the head of legal whomever else needs to be involved, based on how you're structured, and then you just start destroying boxes until something changes.
Jason Stearns:
Either you destroy all the boxes, you need to destroy, a legal hold comes in, or you get some new insights. My favorite at an organization, as we were going through everything, we had a bunch of boxes that had useful descriptions like, 'Jason's box number 4,' had 80,000 of those, or some odd number like that. But one of my business groups as the business shifted away from paper, had the bright idea of taking the manual indexes they had been preparing up the boxes over the years, putting them into a box and sending it off site without a great description on it.
Jason Stearns:
So as our off-site vendor was going through boxes, they were under the instruction of, “At least crack the lid, take a quick look at things, and if you see anything out of the ordinary, let us know before you chuck it in the shredder.” And we were actually able to find indexes for this one line of business, so all of a sudden it was like, "Okay, well, stop," okay, we documented all this, "Stop, now that we've got some additional insight as to what may or may not be in these boxes, does that change our approach? And did we destroy anything we shouldn't have? Can we reconstitute it if necessary?"
Jason Stearns:
And that's what you do, you pick something. What I find particularly on the electronic side, is you start with a low-risk business group, or you start with one of your pains in the you know wheres business group to win them over. And once you go through a couple cycles of this and you start to effectively deleting some data and nobody dies, you'd be surprised how quickly firms open up to the idea.
Jason Stearns:
And yeah, you're going to run into pockets of resistance, and you're going to run into areas where it's not as clear cut because of the nature of the business or some other issue. But there's a lot of places in every organization where you can start some very simple cleanup just by choosing to start.
Bill Tolson:
That's a great point. I think most people overlook that. Years and years and years ago, I worked for a military defense contractor and dealt lots of electronic components and testing and things like that. And I had to pull some hard copy data back from the place where it was being stored, and there were literally hundreds of thousands of Bankers Boxes of this stuff. But what surprised me was there were hundreds, if not thousands of Bankers Boxes full of backup tapes.
Bill Tolson:
And most of them were non-functional anymore. It used to be back in the day, you had to refresh magnetics. You had to refresh your backup tape, otherwise it starts to derez. But they had all of this stuff and my question to management was, "Why do you need a 22-year-old backup tape? What are you going to do, restore a server?" But if it's there, somebody could ask for it, either the DOA or in a lawsuit, somebody else. So whatever happened to the idea of recycling backup tapes, every 45, 60 or 90 days, versus just keeping them forever. That never made a lot of sense to me.
Bill Tolson:
And I think one of the reasons for that was, and people, companies still do this, but a lot more in the past was, “Well, that backup tape is my archive.” And obviously I think, in my opinion at least, backup tapes backup should never be used as an archive. They can't be.
Jason Stearns:
In general, you're absolutely right. Firms relied on their backup tapes as an unofficial archive. And I have worked at organizations where the word backup and archive and record-keeping have been used interchangeably, and you have to just educate them. I think that's one of the big things we need to do more as information management professionals, whether we're in the traditional records management space, the IG space or some hybrid in between, is we need to educate, up and down the organization, just stuff as simple as using the terminology correctly.
Jason Stearns:
Because the last thing you want to do in a deposition or in an interview with a regulator, have someone used the wrong terminology, and you've actually got that one regulator or lawyer who knows what the terms are supposed to mean. The fire drill that ensues after that, trust me, anyone within the sound of my voice, because I've been involved in them, you do not want to be in those fire drills. Having to say, “They misspoke. They used the term interchangeably, but that is not an archive, it's backup tapes.”
Bill Tolson:
Exactly. And that's a long running pet peeve of mine, is companies either knowingly or unknowingly, relying on their backups as an archive. And it's like, well, if you've ever had to restore a backup tape for discovery, for example, it's not like going and finding a file and calling it good, you've got to go find the tape, you've got to mount it, you got to do a full restore to a blank server, then you've got to search that server for the very particular content and then blow it away, or in reference to maybe a right to be forgotten request, you got to delete that specific data and then re-back up that server and do some other stuff. So, yeah, I mean-
Jason Stearns:
And it's rarely just a tape, it's usually a set of tapes and somewhere along the way, tape number four got misplaced, and now you can't restore any of it or you can't restore the data, you actually need to restore because of how the backup was created across the library. Yeah. I mean, it's not a pretty picture
Bill Tolson:
Well, and that, we won't get into it now, but I'll just mention it here. I wrote an article on this sometime this year, and it was the idea of, does the right to be forgotten, include backup tapes? Does PII sitting on 22 backup tapes in a mixture of a thousand data has to be gotten rid of? And there's a really interesting legal questions around that. I've actually kind of gotten into... I won't get into it here. If anybody's interested, then they can contact me. But-
Jason Stearns:
Yeah, there's definitely some mixed thought on that. And then again, when you're dealing with a company that is very US-centric and they're thinking, for all of the noise that was around GDPR when it came out, other than again, not to overuse the phrase, other than the fire drills we all went through to get ready, I haven't seen a lot of change in behavior.
Jason Stearns:
Now, if you have a big multinational and you've got a decent sized footprint in Europe, maybe there's some changes in your European operations. I'm just not seeing them either in the organizations I've worked for, again, with my peers and colleagues, I talked to, just not seeing the types of changes that I think people were expecting.
Bill Tolson:
Well, and that probably need to be done over time, but the response I get back from company says, “The chances of us getting nabbed are small in the short term, we're just going to wait and see what happens.” And it's hard to sort of argue with that, but you might be really unlucky, and there are trolls out there going to hundreds of sites looking, “Hey, if you don't have a listing for a DPO officer, then they're going to file suit.”
Bill Tolson:
I mean, there are people who are trying to make a living out of this and with the CCPA, CPRA, with the new Colorado Privacy Law that just came into being last month, and I'm in Colorado, so I've been following it, the Virginia Bill, what they're doing in New York, around data fiduciary and all kinds of things, this is going to get... And by the way, the lack of the feds doing anything, the Congress doing anything, it's just going to get much more complex for companies until the Congress puts out a bill that supersedes all of the state bills, but I think we're many years away from that.
Jason Stearns:
Many of the vendors I've worked with at organizations over the years, store the data in Canada. Canada has got a reasonable privacy regime. They haven't been designated as an adequate country yet under GDPR. That's because they wouldn't be with PIPEDA in its current format, but guess what, they're redoing it. And it's going to be very GDPR like if the authorities in Canada get their way. I think that's potentially a game changer for a lot of US companies because [crosstalk 00:48:34] reliance on Canada as this magical, safe space to store our data.
Bill Tolson:
Yeah. That's changing. It's changing all over. It's getting so much more complex, but it's really an interesting time to be involved with information records management that's for sure, but I think privacy is going to be driving this over the longer period because with the right to be forgotten, right to erasure-
Jason Stearns:
Even just the right to access. Just to get the access to an individual's personal data is a challenge.
Bill Tolson:
To know what you have, because you have to respond to those questions too. So I think that's yet another, the whole privacy thing I think is probably a really interesting podcast here in the future, Jason. But with that-
Jason Stearns:
You know where to find me [crosstalk 00:49:17].
Bill Tolson:
Oh yeah. So Jason, I think that wraps up this edition of The Information Management 360 podcast. I want to thank you for this really interesting and insightful, enjoyable discussion we've had today on defensible disposition. If anyone has questions on this topic or would like to talk to a subject matter expert, please send an email, mentioning this podcast to info I-N-F-O@archive360.com, and we will get back to you as soon as possible. I would assume Jason, that you would be open to people contacting you as well?
Jason Stearns:
Absolutely. The easiest way to find my contact information is to just pull up the ARMA international webpage at arma.org, and you can find my contact info on the board of directors' page, or it's just a simple jason.stearns@armaintl.org, if you want to email me.
Bill Tolson:
Fantastic, Jason, really appreciate it. And I'm looking forward to speaking at the ARMA InfoCon conference in Houston later this year, hopefully to meet you in person there as well.
Jason Stearns:
That would be great.
Bill Tolson:
So thank you, Jason. And that wraps it up for today. Thanks everyone.
Questions?
Have a question for one of our speakers? Post it here.