Description:
Our special guest in this episode is John Mancini, President of Content Results LLC, and long-time past president of AIIM. In this episode, John discusses how the surge of data in organizations has forced the evolution of the "traditional" records manager role to now be responsible for all information within an organization including its privacy, security, retention and disposition.
Blog
Modern Attachments – An eDiscovery Quagmire?
The modern attachment capability can be problematic for both regulatory data retention requirements and in litigation hold/eDiscovery.
Speakers
John Mancini
President
Content Results, LLC
John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on genealogy, digital transformation, information governance, and intelligent automation. He is the author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77.
Bill Tolson
VP of Global Compliance & eDiscovery
Archive360
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.
Transcript:
Bill Tolson:
Greetings everyone and welcome to the Information Management 360 podcast. This week's episode is Titled, The Transition from Records Management to Information Management. My name is Bill Tolson and I'm the vice president of compliance and eDiscovery at Archive 360. With me today is John Mancini, president of Content Results and a longtime past president of the AIIM Organization. John, I know you're involved in lots of things, can you give us a quick update as to what you've been doing here recently?
John Mancini:
Sure, Bill. I kind of feel a little bit like Where's Waldo. A lot of people will say, "Gosh John, it's been four years since you were at AIIM, what are you doing these days?" I run this consulting firm that helps companies develop their marketing messages. I also spend some time working in the M-19-21 arena in the federal government, which has to do with the rules of records ascension to the National Archives. I write a column for CMSWire. I also work with a company called Infotechtion that does an awful lot of work in the M365 automated governance arena. And then lastly, I work in content development for the MER Conference.
Bill Tolson:
Fantastic. Well John, then I'm really looking forward to the discussion today and thanks for joining me today on our Archive 360 podcast. Let me open it up here with kind of an opening statement. With the constantly growing surge of data within organizations across all industries, as well as the new privacy laws that are coming up from individual states, as well as countries around the world, organizations now have additional responsibilities with securing personal information, as well as being able to respond to data subject information reporting and right to be forgotten requests. Companies must now know exactly what personal inform they hold, how it's being accessed and used, where it's stored and how it's secured and at the end of the day, how it can be found and removed based on a request from a data subject.
Bill Tolson:
These privacy laws are really calling into question, well not calling into question, they're really focusing on all information within an organization instead of the old mindset, I shouldn't say old, the mindset of basically managing records for compliance and to some extent to eDiscovery. With that in my John, let's go ahead and dive into the discussion. The title of our podcast today is, The Transition from Records Management to Information Management, could you briefly describe the difference between records management and information management?
John Mancini:
Yeah. Bill, it's kind of a hot button these days sometimes within the records community and within the information governance community. I guess the way I view it is they're both important. Both those disciplines are important. Records management is something that we've done for a long time, going back into the paper based era. We know a lot about how to do that. We know some of the rules, there's a well established professional practice around that and that's terrific and that doesn't go away. But what needs to happen is it needs to be applied now to the world of digital information. And so within the world of digital information, there are going to be records, some of which are kept for short retention periods, some of which are kept for permanent retention periods and everything in between. And that doesn't go away, that need to figure out what are the permanent records that document what an organization did, who did it, how they did it, when they did it, all that kind of stuff. Then there's all the rest of the information that surrounds an organization.
John Mancini:
And I think one of the things that I think we have to get comfortable with is that the same kind of principles that guided us in the records management arena are also useful in the information management arena because the way in which you manage information can't be just purely a function of the amount of storage space that something occupies. And I can remember all sorts of efforts to manage information that were kind of brute force sorts of efforts, where people would say, "Get rid of anything that's older than X. Get rid of anything that's bigger than Y." And that misses the point. And I think the concept that existed with than the records community that's useful in the information management community is that the context and content of that information matters and that ought to be the driver for how you figure out what you save and how long you save it and what you get rid of and all those kinds of good things.
Bill Tolson:
Right, right. I remember not too long ago, probably five, 10 years ago, it used to be that corporate general councils wanted to not save anything they didn't absolutely have to. They wanted to get rid of everything just as soon as they could, for all kinds of reasons, including smoking guns surprising them during eDiscovery, all kinds of neat stuff. And their attitudes are really changed to being, I won't say overly conservative but being wildly conservative and being okay with keeping everything forever. I think it's also a big mistake and that's where we won't in this podcast, but that's where we can get into the topic of defensible disposition and what that means and what are the good and bad things about it and all that kind of stuff.
John Mancini:
That's a really good point Bill, because it is a balancing act that organizations need to strike. There are certain things that are required to be kept because they are legally required in order to be an organization that functions in the world. There are other things that you need to keep because they potentially have business value associated with them, either short term or longterm or ultimately from the point of view of diving into them. But there's also stuff that's just a bunch of junk and that stuff can be gotten rid of and you really don't want to keep that stuff forever and ever and ever. And so that balancing act that you posed is really important for organizations to think through because it's not a question of just keeping a very few things that are pure records and getting rid of everything else and it's not a question of just saving everything because storage is relatively cheap, it's a balancing act.
Bill Tolson:
Yeah. No, that's a great point. And you mentioned the balancing act and besides the required business records for compliance and so forth, you mentioned those data, information, those files that have business value. And that's really the art of this whole thing is without a 100% accurate automation and AI, we'll get into that part of the discussion a little bit later, how do you determine business value? And in a lot of cases, it's being left up to the individual employees to decide what's a record? How long it should be kept. If it's not a record, is it valuable enough to be kept? And for how long? Those kinds of things.
Bill Tolson:
And then at the ever increasing kind of environment that is all of that information, including business values is really being used now by analytics, analytics routines. The analytics programs want access to relatively large amounts of data, historical data to be able to do what they do to derive value for sales and marketing and engineering and so forth. I think that this is where we're getting into the challenging aspects of information management is what should be kept that's not an obvious compliance record? What should be kept? And how long, so it could be used for these other business practices?
John Mancini:
You're exactly right. And that phrase that caught my attention when you said it was the notion of leaving this up to the individual employee to determine what's a record and what's not and what gets saved and what isn't. And I think one of the challenges that we have is that's been one of the huge problems in the space for as long as I've been in the space, has been the idea that as either records managers or information governance people, do things like define records retention schedules and how long different kinds of assets should be kept and all that.
John Mancini:
This notion that somehow an individual knowledge worker's going to help us in that task, that they are going to somehow be willing to drag assets into folders and all that manual stuff, it just has been unrealistic. And in most instances it's been a huge failure because people are busy. I'll be the first to confess, when I'm sitting there working on something, I am not thinking about dragging it into the right folder based on retention schedules. And most people are like that. They don't have time for that. We've got to automate that process, which gets into some of the things that you were hinting at in terms of where the industry's going.
Bill Tolson:
Yeah, that's what I hear almost universally when I'm talking to clients or potential clients around their information governance, information management questions is reliance on end users has never worked and it never will work. And the complaint or the issues are like you just said, they physically don't have the time and they haven't been trained as records or information managers. They might get an hour of training per year or something like that but they're having pressure applied to them from their bosses and other departments and so forth to get things done. When I was consulting full time, years ago, we would amongst us consultants, we'd talk about a given company's five second rule. We would basically describe their custodians' records management practices as if it took them more than five seconds to determine if a document, an email or record should be kept and for how long and where to store it, they would either delete it immediately or mark it as keep forever.
John Mancini:
Yeah. No, I used to laugh that when I used to make presentations, sometimes I would challenge people to look at the last few documents that they created in Office, for example, and pull up all those mystery views that show the metadata that you supposedly created when you created this file and take a look at what actually is there. And in most cases, people have hardly anything. They have a couple of data points about when it was edited or when it was created and stuff but anything that would be really useful to really understand what's in that document, most people haven't bothered to fill it in.
Bill Tolson:
No and again, that's because they don't have time. Look back and you mentioned this, the eras of physical or hard copy records and thousands of bankers boxes and how accurate were those written indexes for each box? And even in highly regulated industries in my earlier days, I used to work for an aerospace manufacturer that did a lot of government work, had mill specs and all kinds of stuff and there was a lot of pressure put on for being obviously very accurate in your record keeping and everything. But I remember based on Air Force requests of data from 10 years ago, they wanted to look at specific component data for a satellite, we'd contact the physical records storage service and ask them to return some boxes for us to start looking through and the index would be a one line kind of description of components records, no date, no nothing. And you'd open the box and there would be no rhyme or reason. And every once in a while you'd find a mostly desiccated, partially eaten sandwich and some other things that were just surprising.
Bill Tolson:
And that use model, that practice, really transferred to the digital data as well. And I'm the same way. I don't apply keywords when I save a Word file or anything like that. I'm assuming that it's being indexed so I can vaguely remember what the topics were about so I can do some searches and things but all that gets into productivity of the end user, how much time do they spend searching for stuff, all that kind of stuff. But I think what you said is right on the money and we have now spoken and we've spoken John, you and I in the past about the changing nature of records management profession and how it's being driven to focus on capturing not just records but all of the information within a corporation, particularly one of my hot button issues is in my mind, 70 to 80% of all corporate data is sitting on employee laptops and workstations, completely unknown by the company in general. They don't have access to it. They haven't indexed it. A great amount of that data.
Bill Tolson:
And by the way, the risk in that data nowadays is it might contain PII, that some data subjects somewhere in California or Colorado or in Europe says, "Get rid of all my data." that means get rid of all of their data, not just the stuff you can find but all of it. And you have to certify that you've found it and gotten rid of it. But if the majority of corporate data is sitting as dark data on individual laptops, how can you say you found all of the data, maybe there's some emails having to do with email addresses and some descriptions with PII in it. But that's why I think we're moving toward the need for companies relatively soon I've been saying this for about five years, so it's being expanded but the need to actually manage all data within the company because especially of these new privacy laws.
John Mancini:
Yeah. It's when you think about, there are two questions that I've asked audiences a number of times over the last two or three years that I think highlight the nature of the problem that you're talking about, Bill, which is the first question I've asked is, okay think about the volume of information coming into your organization right now and call that X and then look out three years and what do you think it's going to be? And almost always the answers come out averaging somewhere between three and a half and 4X. Well that's a gigantic jump. Then the second question I ask them is, okay, think about what kind of information is that? Is it hard data? Is it semi-structured information? Is it unstructured information? What is it? And what percentage do you think is semi-structured and unstructured? The stuff that we've always called in the space, content.
John Mancini:
And that answer almost always comes back between 55 and 60%. You put those two together and you add to it that question that you've mentioned about mobility and about local devices and all of that and you've got a huge challenge out there that organizations have to figure out some way to get their arms around it because it won't be dealt with in traditional ways.
Bill Tolson:
And they're liable for it. And the privacy laws are not going to get less stringent, they're going to get more stringent and being able to know where all of that data is and it's not just reacting to a right to be forgotten it's cyber and ransomware and stuff. If that stuff gets stolen somehow, what happens if an employee takes their laptop home, puts it in their trunk and it has all of the email, all kinds of stuff and there's obviously personal information and a lot of that documentation that's stored locally and the car gets stolen? What does the company do? Who do they notify? And if they don't notify, are they liable for being in noncompliance with notification loss? Obviously they are but this is the real complicating factors I think.
Bill Tolson:
In all of my consulting time, years ago, I ran across two companies out of probably the hundreds and hundreds that I worked with that actually forced a relatively good system of syncing all data centrally. Actually both of them made sure that anytime an employee, whether they were traveling or whether it was just a workstation but anytime they synced up to the enterprise for email and all that kind of stuff, any new data would be copied and pulled into a central repository that would be indexed and managed and those kinds of things. And they did it for liability sake, number one, and one of them was a bank and they were very careful about PII and all kinds of other stuff so they didn't want data to not be searchable and findable and those kinds of things. It gets into a corporate culture more than anything else. And this is again, one of the things that I've written about in the past is if we're moving toward companies, organizations needing to manage all information from their employees, what does that mean for the corporate culture?
Bill Tolson:
Because I've found again over the years, especially in high tech but in all kinds of industries, at least in the United States, that data that employees create that are sitting locally on their workstations and laptops is kind thought of as theirs. And when you say, "We need all of it, we need copies of it," there's a lot of cultural pushback saying, "You don't get my data," even though it's obviously the company's data because they're paying those employees to create the stuff. I think that's also a complicating factor in that I think organizational corporate culture need to be changed and that's obviously a slow process.
John Mancini:
Yeah. No, I think this question of culture is an important one because there's so many dimensions to it. I think one of the challenges that exists in many organizations is that, and this is an overgeneralization so nobody contact me with angry letters or anything, but IT people generally understand data really well but they don't understand content and content is messy. It is complex. Its value is based on the information that it contains and conveys and also the risk that it represents is based on what's in it. And so you talk about things like PII and finding that in subject to a request or something.
John Mancini:
Finding where PII exists in a database, in a field that says SSN, that's not very hard. You can find that pretty easily but finding a Social Security number that's buried in an attachment within an email system that exists on a local device and also exists probably on a phone someplace, that's a much tougher challenge. And it requires a change of approach of really understanding this weird niche that we call content because that's often where so much of the risk lies and it's not pretty and it's not easy.
Bill Tolson:
And that's where we'll get into a little bit more of this a little bit later in the podcast but that's where you get into AI and machine learning, determining not only the content but the context of the content. We'll get more into this. But being able to determine if something is sensitive, is it PII? Is it whatever? Obviously humans aren't going to be able to do that consistently and they haven't proven to be in the past. Getting into AI and machine learning is going to help that a deal. Obviously it's going to give us consistency and accuracy as well. But I totally agree, and as the amount, the velocity and volume and variety of data keeps growing and expanding, it's never going to get easier. And the amount, the sheer amount of data that employees generate and receive on a daily basis, they can't manage that, can't even understand most of it just because they don't have the time to look at a lot of it. Using a record, how long should it be kept? Who knows?
John Mancini:
The other cultural element, which I think has been helpful actually as a result of COVID and the pandemic is this rising awareness at the C level based on having to totally adjust business processes on the fly and digitize business processes in ways and at speeds that they never really anticipated just to stay alive. There is an increased appreciation, I think, for the value of information assets and how the management of those assets translates into business value. And the fact that those assets need to be managed independent of device and independent of location and in a much more strategic way. I think the question is, okay, that awareness is maybe makes life a little easier for those of us that have been advocating these solutions for a long time but will people actually act on it? That's kind of the next question.
John Mancini:
The other cultural thing, which I think is so interesting, you mentioned ransomware before and that also, that's served a constructive purpose. Although many organizations have yet to actually act on that but that question of state driven actors that are basically trying to penetrate our systems and hold them hostage, that opens up all sorts of vulnerability to all sorts of size organizations that they never really expected before. And if they could have insured their way out of it in the past and just paid the people off and have the insurance pay the proceeds, that's not going to work moving forward because lots and lots of insurance companies are starting to look at the question of how well information is managed within an organization as a precursor to whether they're insurable or not.
Bill Tolson:
And what their rate will be.
John Mancini:
Exactly. There's all of a sudden that dial has been dialed up if you will, which again is an opportunity for us to poke at that question of holistically and strategically, how are we trying to manage information in this organization?
Bill Tolson:
Yeah. Ransomware, extortionware, cyber in general has been a major topic when we're talking to clients because what you just said, the amount of data, the amount of stuff you want to do with the data, all that kind of stuff really is now beginning to lend itself to the cloud versus on prem and does the cloud look at cyber and extortionware and ransomware and all those kinds of things? In I've written relatively extensively about this, the three stage extortionware now where it's not only encrypting sensitive data within a company, it's copying it and then they're basically extorting that if you don't pay the ransom, one your stuff, your sensitive data's not going to be decrypted but number two, we're going to release all of your sensitive data on the internet.
Bill Tolson:
And it was really interesting early last year, I think it was in March, March or May where Archive 360 is a member of the Cybersecurity Tech Accords and I was writing a blog for them around security and I kind of floated the idea of these new versions of ransomwares where they're copying the data onto their servers and then basically threatening to release it. In the blog I said, "What if that happened and they copied the data but the next step could be, they could contact the GDPR authorities in the EU and tell on you that your PII had been stolen and the cyber guys are releasing it and that the GDPR would then be well within their regulatory rights to go back to the original company and fine them because they didn't keep the data secure."
Bill Tolson:
And with the GDPR, there's some massive potential fines there. I put that in the blog and it was published and everybody thought it was interesting. I don't know if too many people took it too seriously but literally about a month later, the first case of that came out where the data had been stolen and the thieves had gone to the GDPR authorities and said, "We're releasing this data."
John Mancini:
Wow. No, it's a perfect case study, Bill, of the stakes and the issues that organizations face right now. It used to be that you could put up these, I would call them like Maginot line firewalls around your organization and you would kind of just like the French did after World War.
Bill Tolson:
I read your blog on that, by the way.
John Mancini:
Oh, thanks. After World War I, for those that don't know their World War I history, the French realized that all of the World War I was fought on their soil and so they put up this line of defenses called the Maginot Line to keep the Germans out. And then lo and behold, the Germans went over it and around it in World War II and because they had been preparing for the last challenge rather than a challenge. And that's when you start talking about GDPR, you start talking about cybersecurity, you start talking about ransomware, so many people have spent so much money preparing for old challenges rather than new ones and to kind of reset their sight right now because so much has changed.
Bill Tolson:
And it's going to continue to change. I've advocated for years but even more so in the last two years and I do not necessarily understand why this hasn't happened bigger amount in the past is why aren't companies encrypting their sensitive data? 10 years ago, there was an argument about CPU usage and did you have enough CPU to be able to encrypt and decrypt and all these other kinds of neat things but sensitive data, a couple of things. And I've done some consulting with one or two of the states here in the United States around their privacy regulations and they've kind of passed drafts by me and said, "Well, what's missing?" Those kinds of things and I immediately say, "Well, your language is so wishy washy that any lawyer can beat it. Gee, you must use reasonably good best practices to ensure your data is secure. It's define reasonable. That doesn't even make sense."
Bill Tolson:
But when you tell them this, we're not advocating a vendor or anything like that but say in the law that PII must be encrypted at all times, period. Anybody could do that, especially if that data's being stored in the cloud and with the economies of scale with cloud platforms, the CPU power's there and it's very inexpensive. And then you can get into things like field level encryption and anonymization and all kinds of things but you still read horror stories of, and I based one of my comments on this, some lower level clerk takes their laptop home and they had just downloaded their whole customer database with financial information and everything and it gets stolen and none of it's encrypted. The company should put processes and places to say that data will never be downloaded, number one but if it has to be, it will be encrypted, period.
John Mancini:
I guess there's a certain element in human nature and probably also then translates into corporate culture where nobody ever thinks they're going to win the negative lottery. And so they kind of, when push comes to shove and you have all these competing priorities and you have all of this, it goes all the way back to when they changed the federal rules of civil procedure and put electronic information on the same standing as paper based information and all that. You would constantly run into companies that would say, "Yeah, yeah, yeah, I get what you're talking out but I've got other things to worry about."
Bill Tolson:
It's not going to happen to us.
John Mancini:
Yeah. They just don't think they're going to win that negative lottery but when they win it, oh my gosh, it's like a lollapalooza.
Bill Tolson:
Oh, then they're willing to throw any amount of money at it. And I've been on the eDiscovery side for years and years and years and work for eDiscovery companies and it's exactly that, constantly the remarks we would get is, "Yeah, we understand this is important. We should be doing it but what are the chances of us getting hit with [inaudible 00:28:36] charge or not being able to respond to eDiscovery appropriately or whatever. And they always say, "Well gee, I'll wait to see if it happens to somebody else in our industry, then I worry about it but it hasn't happened so far so I'm not going to spend the money on it." It's the negative value proposition. It's insurance and no one likes insurance.
John Mancini:
Yeah. And if you don't want to pay it but you also don't want to get the benefit of it either.
Bill Tolson:
Yeah, well true. Both with the PII as well as, and a lot of translates like you said, translates into litigation preparedness and eDiscovery as well. And the downsides, especially when dealing with litigation can be much bigger than a fine from a government agency. Even though those government agency fines can be gigantic. Litigation could be even even worse. And I think back to when the federal civil procedure were in recent times were amended in 2006 and then again in 2015, it started to focus at least the general councils and the legal departments a little bit more on what data do we have? How do we protect it if need be? Who has it? Those kinds of things. But I think with, as we've said, with the new privacy requirements, China just brought out a big, massive one and Argentina and Brazil and all kinds of other and they're all slightly different. But if you cannot respond to their requirements, small to even medium size companies can be driven out of business in a week just because they didn't be able to manage their data and security appropriately.
John Mancini:
Yep, yep. And that world is very complicated too, especially when you get those country specific regulations because often there are Trojan horses there underneath. For example, China underneath the covers of what seems to be privacy regulation.
Bill Tolson:
I won't make any comments about the Chinese law. John, on another topic, one of the biggest challenges I've run across over the last 10 years or so is the corporate adoption of, and I might not be describing these the right way and tell me if you don't agree with me but the way that corporations in the past have adopted big, relatively expensive enterprise content management systems or as Gartner defines, a content services platforms for records management and back in the day, I could see the need for that. But as we transition to more of an information management requirement for companies and much larger amounts of data versus the five or 6% of corporate data that might be a record, do you think that a new category of application need to be developed?
Bill Tolson:
Because I don't think many of those big ECM systems lend themselves to much larger amounts, managing much larger amounts of data. And I only say that because they're still and a lot of them, they're still relying on the end user to drag and drop, like you said. And one of the things we've talked about with Gartner over the last couple of years is content management systems really kind of backing off a little bit and morphing more into maybe super archiving systems. Do you have any opinions on any of that?
John Mancini:
Yeah, I think it's interesting. I think I tend to view the content space as almost existing in two, maybe three gigantic buckets. The first bucket is what I would call transactional content management and Forester does a good job talking about this and that's the idea that, okay, you've got these processing houses that process millions and millions of documents that come in, whether it's insurance claim processing or medical processing or whatever, that process huge amounts and huge volumes of information and essentially digitize it and inject it into workflow processes. And that's kind of one whole cluster of stuff. Then what wound up happening is that there's a second area of content, which is all of the collaborative stuff that we do, a lot of which occurs within the office suite, not necessarily all of it in the office. And what we tried to do, I think was take the same kinds of systems that we had used on the transactional sides and use it to manage all that other stuff.
John Mancini:
And I think a couple issues arose in the process of doing that. One is that that first order of system tended to be very expensive and not necessarily easy to use and certainly not invisible. And so, that was kind of one whole set of issues that emerged. And then the other one was that as the base of users grew in terms of that collaborative content, the cost structure got way out of whack in terms of what was useful for managing all of that information. A lot of change going on there now as a result of the cloud. I think about M365, for example, in terms of how that has changed. And there was conversation for years about SharePoint being a records management platform and we know that wasn't really the case. And so as a result, people bolted on other stuff on the top of it.
John Mancini:
But now, in terms of good enough governance in terms of the vast amount of user created content, there's a lot of potential there in the platform and a lot investment going on in that platform. I guess rather than my long answer that I gave you, my short answer to your question would be that I think those traditional platforms are under some amount of pressure right now because there are alternative ways of achieving the same end.
Bill Tolson:
Yeah. And I agree with that. In fact, I was talking to one of the CSP Gartner analysts yesterday and asked that same question and the response was, the ECM CSP platform market is not growing and in fact it's shrinking.
John Mancini:
Consolidating too.
Bill Tolson:
Yes. And for all kinds of reasons, we won't go into here. Complexity and like you said, cost and kinds of other things. But with our clients and we deal with very large clients, Wall Street banks and so forth, what we've been hearing from especially the larger clients is, we want records management slash archive, if you want to call it that but we don't want the complexity of the ECM system. We want to simplify it, make it easier to use. We want it to be low cost and we want it to be customizable. Meaning you take products ABC and it's great product but it does specifically these things and you can't adjust it. And many of these large companies all want to tweak it.
Bill Tolson:
And especially in a SaaS system, software as a service, you can't really do that because that's not their model. And the model is we're going to give you an application and you're going to use it or not but going to a more customizable model, like a PaaS model, you can add different things to it. I think, and I mentioned this to the Gartner analyst many times and they've all said, "Yes, we see that too." The idea of maybe ECM and archiving kind of combining and creating not a super set but taking the best of those various platforms and using that for information governance.
John Mancini:
All of that needs to be more seamless, needs to be cheaper, needs to be invisible, needs to focus on large categories of risk. And lots of times the older systems and I find this particularly in a lot of the work that I do when I wear my Infotechtion hat, which is in the M365 arena, is that so many of these records systems and records practices are way overly complicated. And so then when you try to automate that and then God help you, then if you try to take it into a hybrid environment that exists both on prem and in the cloud, those systems can't do it. But there's an underlying question that many organizations need to address, which goes back to your cultural question, which is that in some ways they need to rethink the strategy and policy that you're trying to automate in the first place.
John Mancini:
Because if you have a ton of event based disposition and retention requirements, which is not usual in the records management community and it's things like we're going to save this document for 20 years after this person dies at some point in the future, that kind of stuff and we're going to have somebody review it at the end of all. It's a recipe for never doing anything. It's a recipe for complexity. And then the other issue that you throw on top of that is that most organizations at scale are doing business in 50, 75, a 100 different countries and how on earth are you going to manage that complexity?
Bill Tolson:
Oh man.
John Mancini:
And you have to change what you're trying to do and then change how you're going to automate it and then change the kinds of platforms you use for that automation.
Bill Tolson:
Yes, yes, absolutely. Perfect. And that kind of brings us into what we've mentioned once or twice and that's employing machine learning and AI capabilities to automate a lot of that data information collection, categorization, classification, storage and so forth. And I know companies are working on it. We're working on it. We've done things. I spent several years at a company called Recommind back in the 2011, 2012 time period where we were doing predictive coding software for the eDiscovery side of the industry. And the idea was using machine learning algorithms to look at millions upon millions of documents and make a decision very quickly as to which ones were relevant to the case. And you'd do that through a machine learning technique, doing training cycles.
Bill Tolson:
You'd give it some examples of what was relevant. You'd give it some examples of what wasn't relevant. You could do that 10 times, 25 times, 50 times until the measured accuracy got up to 98, 99%. And we started looking at that as why couldn't we do that with document categorization for records management? And here we're talking almost 10 years ago and it was in the early phases but I think my thought has always been that one of the holy grails for records management or information management was consistent and highly accurate automatic categorization or classification of documents.
John Mancini:
Yep. From your lips to God's ear.
Bill Tolson:
Exactly. Because as part of the predictive coding thing with eDiscovery, we did studies and surveys and found that in that arena, and a company might employ 10 contract review attorneys to look at a million pages of documents. And depending on those specific attorneys, you could get an accuracy rate, a consistency rate of between 40 and maybe 50% just because. Just because of each attorney's background, what law school they graduated from, what they had for dinner the previous night, whether they got in a fight with their kids. Everything would basically impact their mental state as they were reading through stuff. And we were showing with predictive coding, we could show mathematically 98, 99% accuracy rate where the contract attorneys were at 40, 45% and judges started to buy into it and everything. Why couldn't you do that with records classification, categorization? And this is no revelation, everybody knows that so I think lots of people are working on it obviously.
John Mancini:
Yeah. There is a cultural norm there too, that you touched on, which applies to this sense of what we think humans are better at and what we think machines are good at. And we haven't really totally reconciled that. It's kind of the phenomenon that when you see a driverless or driver assisted car have an accident, all hell breaks loose in the papers and in the media and nobody ever kind of takes a step back and said, "Well, let's think about how humans do."
Bill Tolson:
Good point.
John Mancini:
And it's the same thing when it to categorizing information, you're totally right. It's hugely expensive to have people doing this and it is hugely inaccurate depending on the nature of the people.
Bill Tolson:
Exactly. As you look at trying to manage a lot more data because of everything we've about, relying on end user employees to do this, is physically impossible. They don't have the time nor the mindset to do it. And I think we're relatively close to starting to get these kinds of consistent, accurate solutions. And by the way, I think the transition to cloud computing is accelerating it because you look at the Azure or AWS cloud and there are various service levels and stuff. In Azure, they have a AI machine learning technology stack and they have thousands upon thousands of engineers working on it. And I'm sure AWS as well. As we're moving into these kind of economies of scale around cloud and these technologies, security is one of them, but machine learning and AI, I think this is going to speed up dramatically.
Bill Tolson:
And one question I forgot to ask you, John, in talking about the cloud and the whole digital transformation thing and as companies are moving to the cloud, have you noticed any obvious, either disadvantages or advantages that the records management profession has kind of brought up as they move to the cloud?
John Mancini:
I think the main issue that not only records managers and folks like that face but everybody faces is that except for companies that are brand spanking new, it's not an either or question, it's an and question right now. And so most companies at scale have very, very large scale business processes that cover some really mission critical functions in the organization. Even the ones that are the most aggressive on the cloud realize that they have a challenge in that they've got to walk and chew gum at the same time in terms of on-prem-land and cloudland. And that actually comes to, when you start thinking about how vendors get sorted in this disruptive environment they were in, the ones that thrived in on-prem-land that can extend their capabilities in a seamless fashion to the cloud, have a huge advantage over the ones that get stuck because the ones that get stuck are going to get left behind.
John Mancini:
But it's not something that's going to happen in the blink of an eye because there's a lot of complexity here inside of organizations. And I like to view it sometimes I say, trying to make this migration from on prem to the cloud is kind of like trying to rewire your house with the juice on. It's not for the faint of heart but you do know that you have to rewire the house. It has to happen. And certainly COVID and the pandemic put punctuation points on that.
Bill Tolson:
Well, and I've written a couple of articles around COVID-19 and especially how it has affected FOIA requests with government agencies. And that has been a real issue and a lot of these, especially federal agencies but also state, some of them have not moved into the cloud yet so they're looking at trying to respond to a FOIA request where everybody's working remotely and some or a great part of the data is stored locally. Basically you're going back to and user again and saying, "Hey, can you search your local repository and find anything having to do with this guy? He's asking for data." And that might take a short period of time or it may take a long period of time. But I think as with the various directives especially from the federal government saying, "Thou shall move to the cloud and do it quickly," I think that's going to help with that as well. And that's what we're being told and that's what we're seeing as well.
John Mancini:
We had this really complicated, there's a thing right now they're talking about on the weather in terms of this nor'easter north of here, this bomb cyclone that people experienced. The last 18 months have been like a bomb cyclone for organizations but what they had to do, think about how much changed that. First of all, we were in incredibly disruptive technology times so that was even before any of this hit, then you had people having to reinvent how they ran their internal operations on the fly, in the blink of an eye basically. And then on top of that, all of their customer interactions changed, all the things that people used to do in person as customers they couldn't do anymore. And that stew has been just phenomenal to watch. And so it's not really very surprising that there's a lot of winners and a lot of losers coming out to this whole thing.
Bill Tolson:
John, I want to save a little bit of time here for you. I know you announced a new book titled, Immigrant Secrets: The Search for My Grandparents, at this year's ARMA Info Conference. Can you take a minute or a couple to describe the book and how you came about doing it?
John Mancini:
It was a labor of love. I spent decades working with records managers and archivists and genealogists on the technologies that they use to preserve information and despite all that, I never really spent any time looking at my own family history. And the only thing, this is the part that was kind of unusual in our family, the only thing my father ever said about his family with his parents who were Italian immigrants, died in the 1930s. And so as I began this records and genealogy search for these grandparents to find out how they died in the 1930s and my dad's long gone by now, I mostly ran into some pretty frustrating dead ends until the release of the 1940s census and then my grandparents mysteriously reappeared in the census but were listed as inmates at the Rockland Insane Asylum.
Bill Tolson:
Wow.
John Mancini:
And as I unraveled this story, an entire extended family of aunts and uncles and cousins, all of whom lived within driving distance of where I grew up, all materialized but were never mentioned. And so what the book is about is what happened, who were these people? Why all the secrecy? And what were the role of records and archives in helping unravel the story. And it's available, I should say Bill on amazon.com under Industry Secrets.
Bill Tolson:
I was just looking at it. I was going to say that. I'm looking at it on amazon.com right now. Great.
John Mancini:
Yeah. It was a lot of fun. And what I did at the ARMA Conference was think about how that five year journey, what did it tell me about the nature of records and governance looking, what do we want it to look like? What platforms are we establishing right now that will allow somebody who is doing the equivalent work that I did looking backwards, will be able to look back at what we've done and understand who we were and what we did and how we did it and why we did it. All that stuff is a lot more complicated in a digital arena than it was in the paper arena.
Bill Tolson:
Yeah. But still very important to keep that history.
John Mancini:
Yep. I was a history major, so that's why.
Bill Tolson:
Oh, nice, nice, nice. Well, I think that wraps up this edition of the Information Management 360 podcast. I want to thank you for this really enjoyable and insightful discussion on info management and records management and a brief discussion about your new book. I think we had some really interesting points that we both talked about. If anyone has questions on this topic or would like to talk to a subject matter expert at Archive 360, please send an email mentioning this podcast to info@archive360.com and we'll get back to you as soon as possible. You can also email me if you have questions, bill.tolson, T-O-L-S-O-N @archive360.com. And with questions you can send him an email with your questions @johnmancini, that's John M-A-N-C-I-N-I, all one word at contentresults.net. And also check back at the Archive 360 resources page under podcast for new podcasts with leading experts such as John today, on a regular basis. And we both thank you very much for taking the time to download and listen to our podcast. Thank you.
Speaker 1:
Thank you for joining us today on the Information Management 360 podcast, brought to you by Archive 360, trusted by organizations worldwide to manage their data in their cloud, under their control. To subscribe to our show or to find out how you can address today's challenges in information management, visit archive360.com/podcasts.
Questions?
Have a question for one of our speakers? Post it here.