Description:
In our latest episode, Bill Tolson and Jason Bero, Privacy, Risk and Compliance Officer at Microsoft discuss long-term archiving for regulatory compliance. This episode dives deep into government requirements for storing data, how to archive important data for the long-term as well as ensure that retrieval of the data is easy as well as a discussion around the tools used today and potentially in the future.
Speakers
Jason Bero
Privacy, Risk, and Compliance Officer
Microsoft Canada
Jason Bero is a Privacy, Risk, and Compliance Officer for Microsoft Canada. Jason is a subject matter expert dealing with many regulatory and organizational security, privacy, and compliance teams across several verticals across North America. He has spoken at many events related to data privacy and risk and has been involved in many cloud transformation projects over his 20+ year tenure.
Bill Tolson
VP of Global Compliance & eDiscovery
Archive360
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.
Transcript:
Bill Tolson:
Welcome to The Information Management 360 Podcast. This week's episode is titled Long Term Archiving for Regulatory Compliance. My name is Bill Tolson, I'm the Vice President of Compliance and eDiscovery at Archive360. With me today is Jason Bero, Privacy Risk and Compliance Officer for Microsoft Canada. Welcome Jason.
Jason Bero:
Thank you Bill, thank you for having me.
Bill Tolson:
Yeah. Now this is going to be fun. Okay. Well, let me set the stage here on long-term archiving. So data archiving to meet regulatory retention requirements put in place from governments and industry professional organizations like FINRA, is a common requirement for most organizations that have had to deal with it for decades, at least they've had to deal with it. In the past companies collected paper documents and stored them for many years by storing them in bankers boxes and moving them off site. When they were needed, the request had to be made and many times you'd need to wait days or weeks to get the specific banker's box that contained the documents.
Bill Tolson:
I spent many years back in the '70s at Hughes Aircraft, did a lot of military work, a lot of mil-spec stuff and we had tens of thousands of banker's boxes and every once in a while I'd get an air force team in that would want to see past records from 10 years ago. So I'd have to contact Iron Mountain and have them go find the box and bring it and it was usually a week or two before you got it, so it was interesting. The other thing is I used to work for Iron Mountain and Iron Mountain is thought of as a records management company and obviously they are, but they're actually classified as real estate investment trust on the markets because they basically control millions and millions of square feet of warehouse where they store all these boxes.
Bill Tolson:
So a little side note there, but over the years records management has transformed from mostly hard copy, like it was just referring to back in '70s, to almost completely electronically stored information or ESI. Additionally, over the years, the sheer volume of electronic data organizations had to capture, store and protect and make available when requested, i.e. the air force requesting from me, it has exploded into the hundreds of terabytes to tens of petabytes. In fact, Archive360 regularly migrates petabytes of data for large clients. I think for most people that's almost unbelievable if you understand what a petabyte is. And getting into the longterm archiving now, many regulatory retention obligations now require the safe storage and easy availability of regulated data for 5, 7, 10 years or even decades, and I'll get into that in a little bit and I'm sure Jason will have some, some stories on that too.
Bill Tolson:
The biggest question many organizations don't take into consideration is what is the best, most economical and secure way to archive important data for very long periods of time, if need be. So the goal of any long-term archiving or archive is to store and preserve digital content for obviously up to decades, as well as ensure that it could be retrieved at a moment's notice regardless of physical location. And that's one of the other side topics, is you might be storing petabytes, but if you can't find what you need when you need it, then it's basically valueless data. So storing the data securely as well as having the capability to find specifically what you need at a moment's notice are the two important factors. So with that long setup, let me start off our discussion with Jason by throwing out the first question. So Jason, what do we mean by long-term archiving?
Jason Bero:
Yeah, great question. In conversations related to differing industries, it does mean a little bit of a different concept depending on whether it's financial services and regulatory requirements for keeping all communication under FINRA in SEC 17a-4. When it's public sector, keeping it for long-term archiving for needs such as freedom of information requests, AATIP requests or [either 00:04:45] access to information and privacy requests. For healthcare, depending on whether you reside in the United States or the EU or Canada related to long-term archiving, a lot of regulatory updates have been made to make it mandatory or enforced on not just long-term archiving, but also the auditing related to it as well. So seeing a lot of concepts related to that and even updates to frameworks and records management such as ICA module two, the ISO regulations making it more and more required, if you will, for keeping active versus inactive records.
Jason Bero:
When we talk about inactive or active records, it's things of significant value currently today, whereas most records when we talk about long-term archiving are inactive and that's where we get into the petabytes and even sometimes zettabytes. We're getting into that world because another growing concern is what is a record, is a concept that is even changing when we talk about what we need to store. 10 years ago, a record was essentially a file or a digital asset or a piece of paper, whereas in today's world with things like Microsoft Teams and Slack and other communication tools and all that communication that was once done in email alone, is also required to be kept for regulatory purposes, whether it's FinServ or financial services or other, so it's a long conversation.
Jason Bero:
And one that is a top priority as you mentioned, Bill, paper, being of significant concern, especially when we talk about times of today where people can't necessarily get to those filing cabinets anymore and they're working from home, how do you address these things when people are still in that mindset or process of trying to keep the physical records and getting those to the appropriate cabinet, if you will, or even to Iron Mountain when those places aren't necessarily accessible? So it's top of mind question for every organization today, because they are looking at cloud technologies in general for process improvements and operational efficiencies.
Jason Bero:
But when they do so, the first thought process is the in place concept and I'll use the Microsoft world, the concept of keeping petabytes and petabytes inside Microsoft 365 is the first thought, but really when we talk about scale performance and efficiencies, the longterm side really does require some investigation and third parties to extract and store in a low storage cost mechanism. It's just not feasible to implement a strategy for records management or regulatory compliance in an in place only concept that we still see many organizations evaluating tools for the inactive processes for things like eDiscovery and recall, a storage of record in a sufficient and I would say a preserved way, a mutable way.
Bill Tolson:
Yeah, yeah, no, that's perfect. In fact, you mentioned FOIA and we've been dealing with many federal agencies that are being overloaded with FOIA requests and those FOIA requests turn into, sometimes lawsuits, mostly because over the last year or so, those government agencies haven't been able to basically find the data being requested because they all are remote and many of the government agencies don't have centralized, standardized information capture and management systems. So a lot of this data is sitting on individual agency employee laptops and workstations, and I think I found a map, interactive map online, that showed the average FOIA response times for each state and the federal government. And in most FOIA laws, you're looking at 10 to 15 days to respond and maybe another 30 days to actually get the request or the actual information. And you're seeing 140, 152, 100 days, for them to even respond, much less get the data. So those things quickly turn into lawsuits because the government agency didn't respond at time.
Bill Tolson:
So having information management system capability is important, but also, like you mentioned, for state and federal agencies, I think most of them, if not all of the states, have requirements to capture and archive data for basically ever much like the national archives from the federal government. So by the way, I mean, this podcast is really about long-term archiving for regulatory compliance, but there are a couple other reasons for long-term archiving. One is to protect corporate history, so to capture it and protect it. I used to work for HP and they went back to 1939 and they had all kinds of really neat corporate history. In fact, I worked in the Corvallis facility where we made all the HP calculators and you walk down this main hallway and it had calculators in frames on the walls, and these weren't just the calculators for that year or something, these were calculators that had gone through all kinds of physical issues and the stories around them.
Bill Tolson:
I vividly remember one calculator, I think it was a 41 CX scientific calculator on the wall. And the story is that a guy was out fishing in the Pacific Ocean off the coast of Oregon and for some reason he had his calculator, he dropped it in the bay. And several years later, somehow it was found by a diver and brought up and they dried it out and put the batteries in and it worked. Unbelievable, but yeah, there was all kinds of stuff, but corporate history is really important and then contractual requirements, obviously.
Bill Tolson:
In a previous job, I worked with a large north American construction firm and they had requirements, they might be building a bridge and that might take 10 years, and then the retention requirements was after the project was finished, you had to keep all the records for 50 years. So there are lot of contractual requirements that really do push out the need or push out the timeframe where documents need to be stored. So Jason, we've talked about regulatory retention requirements, and I think you mentioned a lot of the various, at least requirements, like for FinServ it's SEC and FINRA and in Canada it's IIROC and I'm sure there's more in Canada, right?
Jason Bero:
Correct. And obviously it's a big one influence in Canada as well.
Bill Tolson:
Yeah. And then MiFID II over in the UK and EU and Sarbanes-Oxley and GLB or Gramm-Leach-Bliley, and then for healthcare there's HIPAA, HITECH in the US and then others around the world. One of the things, in the US, the insurance industry has a lot of interesting retention requirements, but unlike the other FinServ and healthcare and stuff, insurance is regulated state by state, by state agencies, not by the federal government, so they vary pretty wildly. Have you ever, ever run across any insurance requirements Jason?
Jason Bero:
Depending on how they deal with the insurance aspects, they still do fall under some aspects of the SEC, but they also are subject, much like everyone else, in privacy law depending on state by state. Up in Canada, we have had looming new update to our privacy regulation that's definitely going to impact the insurance industry, are related to electronic records, physical records, long-term retention and just the ability to recall it in the concept of back to the FOIA in AATIP, when we refer to data subject requests commonly known in the GDPR, there's going to be an uptick to that.
Jason Bero:
So having the insurance companies when they're dealing with so much information and communication, especially when they've moved in the most part, most insurance companies are very digital these days and instead of the traditional going to the customer's house and having that conversation, they're now doing all of this virtually and a lot of that, those assets and communication, still need to be recorded and kept. So it's changed even the landscape as to how they're creating the content and the content they need to store, not even just what's influenced by the regulation and a lot of conversations related to how do we take care of this Zoom problem that's been introduced during the pandemic, or the Microsoft Teams or the Google Meet app. These are all conversations that are happening today related to longterm archiving that most IT and compliance professionals never really had to think about before, because it just wasn't the way they did things.
Bill Tolson:
Yeah, they weren't prepared for it, that's a great point. Several of the regulatory requirements for select industries, you've got to capture all communications and not just email, but within Microsoft Teams, which everybody uses, wild term, but as far as I know, everybody does. Yeah, there's chat, there's group chats, there's video all the way down to sentiments and emojis and things like that. I mean, I would think that, say for FinServ, for the broker traders, they have to capture all of that I would think, but as you probably know Jason, those agencies tend to run a little slow, so it might be awhile before they address within a Teams or a Zoom or something else what exactly has to be captured.
Bill Tolson:
One example I had, we dealt with an insurance company and they had all of their policies and this was a very old insurance company, they had all of their policies digitized and then in PDF, then they had them sitting on their network file shares, which it was just massive and what they wanted was they wanted to put it in the cloud, in this case Azure, and they wanted instant access to it for 99 years. That's the longest one I've dealt with so far, but made a lot of sense. What we've talked about, Jason, is this idea and I'm going to jump a little bit ahead here, but data formats, how do you ensure that a data format from 40 years ago is still readable by whatever software you have now? And I know Jason, you and I have talked about long-term marketing formats and I know Microsoft was involved with it for a period of time. Any thoughts on that?
Jason Bero:
Yeah. I Generally find when it comes to long-term file formats, the general individuals that govern that and through conversations that I have is generally the information governors or the information managers or the records managers, because it's been a problem, one they've had forever in the concept of, even from a Microsoft perspective, keeping Microsoft DOC files versus DOCX, when the file format changes and they can't necessarily still support the backdated, but as you can imagine, there's so many different tools and applications that can be used throughout time and I'll use Lotus Notes as a perfect example of that, where there has to be some form of standard or at least a file format that is accepted. And generally what we see from at least a file perspective, is the conversion of PDF/A as the acceptable file format to support that long-term being able to recall and being able to actually open it in 50 plus years, as you can imagine, from now when these applications may no longer exist.
Jason Bero:
So it's definitely there, but the problem that occurs with that is not everything is a file and not everything necessarily makes sense to translate into it. I would say we're not there, I would say globally, to handle the amount of applications, as you can imagine, just cloud applications that are out there in the workforce for different purposes generating differing file formats and we're looking at around 17,000 plus file formats today, and that's a lot different than what we dealt with many years ago. It's obviously even a concern with us and Microsoft and I'll use a stat that might tie back into the long-term archiving. 10 years ago at Microsoft, it was very common that the typical custodian within Microsoft or employee, if you will, generated four gigs of content. That was their total content to which now the statistics take you forward 10 years later, we're looking at 86 gigs per custodian.
Jason Bero:
So just the volume increase, not just the tools, but the volume increase is something to also... the traditional sense of, "Declare end user responsibilities to determine what things are," even that is something of concern today for information managers, governors and eDiscoveries, it's because when you're asking an end user to make a determination when their content is so abundant, they're just not doing it as much and so there's a lot more dark data that's existing today versus then say 10 years ago, because a typical custodian could manage their own inbox or their own file storage, if you will, on a network share differing to the problems that we're also seeing today with just the digital estate expanding at a rate that without automation in some form or sense just will not be a successful information governance strategy and long term archiving strategy.
Bill Tolson:
Yeah, no, you hit it on one of my pet peeves. It's this idea that I've read lately, that the average information an employee comes in contact with anywhere from 50 to 200 megabytes of data per day, email and all kinds of other stuff, like you say, you start multiplying that by 365 times 20 years, whatever it happens to be, you're looking at real data as Congress would say, but you also mentioned the velocity of data, the amount of data and how fast it's coming in is really a problem. And one of my issues for the last 20 years has been records management people within the company, and not just records management people, the company in general, but it all falls back on the employee or the custodian. You get an email and you got to determine, "Do I keep it? And for how long and where am I going to put it if I have to keep it?"
Bill Tolson:
I consulted for many years around records management and one of the things that we talked about was the five second rule. If it takes an employee more than five seconds to determine whether what they're looking at is a record or not, they will either mark it as keep forever or delete it immediately. And I've walked in, when I was consulting, I walked into companies, I'd ask them for their records retention policy, and it might be 200 pages long, eight point type, thousands upon thousands upon thousands of different kinds of documents and you'd ask the management, "You expect employees to follow this?" And they say, "Well, absolutely." And there's consulting, I'd interview 40 people and out of the 40, I'd asked them the question, "Are you following the records retention schedule?" And out of the 40, 38 didn't know what a records retention schedule was and the two who did had never seen it before.
Bill Tolson:
So management was so far off that they were putting this on the individual custodians, but nobody was doing it. So that's a totally different discussion, but that's where we get into the holy grail of records management and that's using machine learning and AI to automatically classify content based on what's in it than the context and all kinds of stuff and I think we're getting much closer to that. And I know Microsoft, Azure and those folks with the AI tier and Azure is doing some really interesting things. Some of the other issues around archiving regulated data on the technology side and Jason, you and I have talked about this before, is number one, we've already talked about data formats, but equipment availability. I remember... Well, again, while I was consulting and Hughes were doing some eDiscovery work, I walked into a company and opened a large cabinet and was full of 80 inch floppy disks.
Bill Tolson:
I looked at their corporate attorneys and I said, "Have you gone through this?" And he goes, "Well, no." And I said, "Why not?" And he goes, "Because there's no equipment to read this stuff anymore." I said, "Well, why do you still have it?" He goes, "Well, people don't like throwing way data, even if you can't access it." And I said, "You're going to have to access it because the opposing counsel is going to make you. So be very careful with this stuff and go out and start looking, see if you can find some eight inch floppy disk drives." And in San Jose, Santa Clara, there is a museum of data storage equipment and they ended up having to go there and borrowing those eight inch floppy disc drives for a large sum of money to go through the data. So there's all kinds of data, all kinds of devices, I'm sure Jason you've gone across many, right?
Jason Bero:
Oh, very many. And even not just the device issue of finding the right device for the right storage repository, but also the concept of corruption. The amount of times that organizations have relied on tape backup for years upon years and years, and then later to try and digitize that back and recall it back, it's just even with the right tool, they're successful at 80% or 60% or even lower, just simply because of the long-term storage of those repositories, they do rely on physical, we'll say attributes, that could degrade the value of what that storage repository is. And when that happens, even having the ability to recall that requires a significant amount of effort beyond just the native recall and trying to use the right tool. Data corruption on tape is a significant problem and has been for many years. Like that storage repository on the physical, just seems to be a really big problem related to two issues. One is finding the right tool if it's really long-term, and then also even having this success rate of recalling it, even if you had the right tool to do so.
Bill Tolson:
Yeah. And how do you explain that to a judge? I used to work for StorageTek here in Colorado, we had lab names from, they made the Redwood tape library, which was the size of a room literally, and you could Daisy chain these things into just massive, massive tape libraries and besides backup, they were pushing... And this was back in 2000, 2001. They were trying to sell it to companies as an archive as well and these things were just massive, required just constant upkeep, because it was all physical robots and things. I mean, they robot inside one of these things to go find a tape and bring it back and plug it in and then spin it up. I mean, they would go from a zero to 70 in one and a half seconds, just amazing stuff, but tape, like you said, tape corrupts.
Bill Tolson:
So one of the things that while I was at StorageTek was, we had to remind people that for longer term storage tapes, those kinds of tapes had to be refreshed, otherwise they would start to lose their bits through the magnetism on the tape. Nowadays there's newer tape technologies where they're good for 30 years before they need to be refreshed, but they still need to be refreshed, same on spinning disk too. It's still magnetics, but that's always moving and stuff, but the tape has an advantage in that, and I'm sure you've heard this before Jason, tape if used and set up correctly, you can create an air gap, which means it's not accessible to the outside world unless it's physically put in a tape drive and read. And even with these robotic tape libraries, A hacker can get into the system and access the robot and start screwing up backup tapes and stuff.
Bill Tolson:
So the idea, and this is where going back to Iron Mountain, Iron Mountain was full of backup tapes, and that was a true air gap because you were looking at 5100, 500 miles between the actual tape in any tape drive with the newer forms of extortion, where ransomware, they go after any backup they can find and corrupt it so that you can't restore. So a tape is interesting, we worked on optical tape and all kinds of things, but yeah, that equipment availability over the longterm really is an issue. And that's where, and I'll mention it here, we'll mention it more, that's where putting data, putting backups, putting all of your ESI up into a cloud that's managed correctly, I mean, they're constantly, the cloud providers, Microsoft and others are constantly refreshing this stuff.
Bill Tolson:
The data is duplicated, geographically duplicated somewhere else and it's constantly going, so you're not going to have that refresh problem, but you get into the idea of all that cloud, because the cloud is the biggest target for cyber hackers in ransomware now and that's where companies like Microsoft have just thousands of people, I would guess, working on all kinds of levels of security for the cloud. And then part of that is the end-user's responsibility, and I've written extensively about this. Any data you put anywhere should be encrypted and encrypting it as you put it into the cloud is a great idea, but 50 years from now, who has that encryption key?
Bill Tolson:
And that gets back onto the legal side too and in fact, I've been in cases where eDiscovery has gone back and said we have these files, but they're encrypted and we can't decrypt them. And in many of the cases the judge will come back and say that's spoliation, destruction of evidence, no issue in adverse inference because it doesn't matter whether it was your fault or not, the data is there, but it's not, you can't get to it therefore, legally, you have an issue. So yeah, I'm getting into encryption key management and availability. I know in Azure, for example, Jason, they have Key Vault, which is a very secure part of the cloud where the customer manages their keys in there, and it's very difficult to get in. And Key Vault has been a huge boon to CSOs who were nervous about putting sensitive data in the cloud, and I think it's been excellent.
Jason Bero:
Into the encryption conversation, it is a top of mind question related to the type of assets, whether it's a government agency of having different classifications of protection. Some assets are going to have less sensitive needs than others and a lot of conversations I have with CSOs and information governors and even legal, is related to that whole encryption. How do we encrypt it for this content? Because it's in FinServ, at Digital Jewels and versus non. Do we use our own and generate our own key and store on-premises and not the cloud? Do we use the cloud? And if we do, do we bring our own key and ingest our own keys into the service? So we still have key ownership, but where we're relinquishing the ability to apply that to the content in creating policies. And then the other trust factor of, for the other assets, does it make sense to have an encryption policy against all content knowing full well that most cloud services that encryption is applied by nature to content, whether it's email in place and things of that in nature [aren't 00:28:08] applied that way?
Jason Bero:
So it's a very big conversation because I've seen organizations take the approach of everything needs to be encrypted with its own unique key, related to our own encryption service. And when they take that traditional approach, the one problem that occurs is those cloud services then become like a dummy repository in the concept of, "Now search is not a usable thing for end users, legal can't discover it until it's decrypted in some way, shape or form." So there's a real fine balance of productivity versus security and it's a growing concern because if you don't address both in the most efficient way, one, you're either going to in fact pack at the end users that require that information for recall, in one way, shape or form or two, you're not applying a sound security posture that you're introducing risks, as you mentioned Bill, related to ransomware and things of that nature for sensitive assets. So information classification is becoming, I would say a conversation that would generally fell into security, where it's now legal records managers and others are getting involved very heavily in that conversation because it impacts them very much.
Bill Tolson:
Oh yeah. I mean, that's where you potentially are looking at hybrid cloud where standard nonsensitive data can be put into the public cloud and no problem, but the sensitive stuff is designated to be stored in the private cloud. And in my mind, there's two private clouds. There's the private cloud on-prem in your own data center which doesn't make sense to me, then there's the public cloud within the private cloud. For example with us, with Archive360, you create your own private cloud within your own as your tenancy, and that's completely separate from the rest of the cloud. So in reality, you could put your sensitive of data in there, you can encrypt it, you could choose to keep your encryption keys in Azure Key Vault or you can generate them and keep them on prem. And that's where, Jason, you mentioned the issues with lawyers and others trying to access encrypted data.
Bill Tolson:
I know Microsoft has this capability, they don't talk about it much, but we've also adopted it, this idea of homomorphic encryption. You can encrypt something on prem before you move it to the cloud, but it's still manageable because homomorphic encryption is a type of encryption that lets files and data within files be computed on without being decrypted. And that's been around a long time, it's just it hasn't been used because the issue is with encryption it's CPU intensive. With standard encryption, yeah, you have to decrypt it before you access it. And if you've got 10,000 employees doing that on a daily basis, you're going to be spinning up lots of servers or you are going to be consuming lots of CPU in the cloud, which is not free, nor should it be, but using these new types of encryption, you can keep the data. One of the things we say is, "Your data is secure in transit and at rest but also while in use."
Bill Tolson:
And that's the next step, I think, for security. The question is, does everything get encrypted, like you said or using those classification techniques, you encrypt certain things and put it in your private cloud within the cloud in general and everything else goes up into the public cloud, OneDrive and all of those kinds of things? And I've read technical papers from Microsoft people that they're using that too. And there's another version or another next level of that, which is secure multi-party computation, which I know Microsoft is working on and we've incorporated as well, it's the idea of being able to set different levels of security on one given file. I won't get into that right now, but it's interesting and I think Jason, what we're probably talking about and we haven't necessarily said it, but I've written about the idea, that the right cloud makes sense and by far the most common, sensical place to store data in the longterm, do you agree?
Jason Bero:
Yeah, I couldn't agree more. The concept of even the ability to scale with the benefits of a cloud or a multi-cloud strategy is with cloud, there's always advancements in technologies to improve storage, to reduce costs on storage and a good example in a very common conversation and related to long-term archiving and just the explosion of data and storage for it, is the whole concept of quantum computing. And so the ability to not only leverage long-term archiving capabilities such as Archive360, but as the Azure cloud to which it's store run upon, scales that the benefit that the Archive360 solution is because it's built natively directly in, is it's going to scale along with it.
Jason Bero:
So you're not worrying about a strategy that five years from now you have to think about, now we have to create up or spin up another VMware, like the physical world of the infrastructure as a service, that's still, there's a capacity to what you have and if you need to expand upon that, that could be very costly as opposed to more of a platform approach to long-term archiving, it is one that just scales as it goes and that's a huge benefit longterm.
Bill Tolson:
That's perfect. I mean, we talk about the cloud, Azure and Archive360 within that private cloud is dynamic. I mean, it can very quickly scale up and down again, by the way, one of the examples we have is a company, a couple of years ago, had a gigantic eDiscovery requirement pop up out of nowhere and they had very little time. And what they needed to do was to start processing hundreds of searches, eDiscovery searches at the same time, which obviously consumed a large amount of CPU within the cloud and everything else and it was basically automatic to where they could do that. Obviously they were charged for the usage of the CPU, but immediately when they were done, their usage went back down again. With on-prem, you have to guess out that stuff ahead of time and buy a bunch of equipment just in case. With the cloud, it's an infrastructure that is massive and can scale immediately when you need it.
Bill Tolson:
And then you get into cost savings where for all the clouds, the big clouds, say the big three, but with Azure, I mean, you have different storage tiers, you have hot, you have cool and you have archive and each one has a different price. With archive, which as we started off, an archive is something where you put inactive data that's not going to be accessed very often. You put it in the archive tier and you're paying almost nothing to keep it there. But it's going to be safe and secure, but you can use something like an old term, a hierarchal storage management to move data from tier to tier automatic, based on people looking at it or recovering it or those kinds of things and that is very handy too, very difficult to do on-prem.
Jason Bero:
Yeah. And then conversations with many clients around the process of, if they're on Microsoft 365, an example, their thought on cost optimization is immediately say, "Well, can we just store everything in Microsoft 365?" And very quickly, they come to that realization that Microsoft 365 is built on hot tiered storage, it's a very costly storage mechanism, it's not designed for longterm or tiered storage in the concept of, if you were to store everything in SharePoint online, as an example, eventually you're going to reach two points of decision. We just can't scale because we're over our amount and usually that comes into consequence very quickly and buying additional storage, hot tier doesn't make financial sense.
Jason Bero:
And two, just from a performance perspective, there's always a threshold in Microsoft 365 that clients come to that realization, that the petabyte world of trying to manage everything in inside Microsoft 365 is just not going to satisfy the requirements for eDiscovery, for search, even for just records management storage in general. And when we're talking about petabytes, not the thoughts of the gigabytes or the terabyte only, because that's a very uncommon term to be used in many enterprise organizations anymore, we're talking peta and zettabytes at this point. And so having a strategy that has a tiered storage repository, one, is cost optimized and two, much more performance efficient.
Bill Tolson:
Yeah. You mentioned SharePoint Online for example. I think SharePoint Online has a maximum storage allotment of 25 terabytes, and then you have to start looking at other options. I, for example, sitting here at my home office, I have three and a half terabytes on my desk sitting here, which is a little weird, but I have a reason to keep it. So it's sitting there, but the whole idea of storage tiering and moving data around based on various policy requirements is a big cost saving measure. So Jason, relatively quickly here, the future of longterm archiving and I've been involved with this stuff and I'm sure you have too, we get into holographic storage and other kinds of storage.
Bill Tolson:
I know Microsoft has a project called HSD, which is holographic storage device, it's obviously not available yet, but you get into being able to store just massive amounts of data in a very small glass or crystal area, and it's three dimensional and the whole shot. The problem is with past holographic storage, it was a write once, read many, you write it within the glass and then that's it. I know Microsoft's project, HSD, it has sold that by using, I think, its ultraviolet light within laser beam to erase data within the crystal, so Microsoft has really made some headway there. Have you run across that project within Microsoft Jason?
Jason Bero:
I have, I've actually been involved in conversations related to that concept because it does impact our long-term archival strategy related to Azure itself. It also impacts our efficiencies related to our data centers ourselves, when we start getting into the concepts of no longer requiring racks and getting into these little glass storage systems. The efficiencies from the performance and cost optimization from a Microsoft data center globally, is going to be heavily impacted and usually what the consequences of that is, as we've seen, is when we optimize storage with differing technologies generally, the ones that benefit the most are the customers using it because it's much more cost effective to have a room versus a football field for a data center in the future, or even going to where we're going, which is underwater storage for cooling and sustainability needs, even getting to the point where down the road, we're going to see satellites of just data centers up in space and others.
Jason Bero:
These are all investments that Microsoft are making to see how we can improve upon that and that is one of the areas that's going to be the revolutionary, I'd say, eye opening moment as to what we can do from an optimization that which then the customers can benefit from that scalability performance storage tiers, because we're going to get to the world of zettabytes per customer conversation very quickly, especially when we talk about the world of the pandemic, what it's created is, think of how many organizations across the globe are just generating Zoom recordings and Teams recordings, large file formats that are created at such scale. It's not going to take very long for an organization to think of a petabyte storage need to a zettabyte storage need, and I predict most organizations are going to be talking about that within the next two years.
Bill Tolson:
Also the big benefit here, that I'm sure a lot of people will be talking about is the idea of energy consumption. If you're storing things on spinning disks and you've got a warehouse full of just nothing but spinning disc versus same amount of storage, but sitting in glass, the energy consumption is going to be much less, which is great, but probably because of the density that square footage is going to be less than that stuff too. I know Jason we're very quickly running out of time, but I wanted to touch on one other Microsoft project. Have you run across Project Silica?
Jason Bero:
I have been involved and had conversation with Project Silica. It's still too early stages that I would say there's a lot to talk about it, but I can say that's another area, Project Silica is related to the same aspect as to what the quantum computing and all of that, but what I would say is for everybody listening here, definitely pay attention to it because it's going to bring about some new opportunities ahead for not just longterm archiving, but performance related to just cloud-scale in general.
Bill Tolson:
Yeah, I read some technical notes from Microsoft on it. I got a quote here from a Microsoft person and the quote is, "One big thing we want to eliminate is this expensive cycle of moving and rewriting data to the next generation. We really want something you could put on the shelf for 50, a hundred or a thousand years and forget about it until you need it, that's longterm archiving." This is really interesting stuff. I mean, the whole idea of holographic storage has been around for a long time, but it hasn't really been number one cost factor, but number two reusable. I think Microsoft, as well as other companies are making tremendous headway, but the thing about Silica and HSD with Microsoft, is they're being built and designed around needs in the cloud, which obviously that's where everybody is going for good reason, so that's really interesting stuff. Jason, I think we probably should wrap it up. So we'll wrap up this edition of The Information Management 360 Podcast. I and Jason want to thank you for listening to this insightful and, I think, enjoyable discussion with Jason on this very important subject.
Bill Tolson:
Long-Term archiving is a subject that won't go away because long-term, it's going to be there. If anyone has questions on this topic or would like to talk to a subject matter expert, please send an email mentioning this podcast to info@archive360.com and we'll get back to you as soon as possible. And just a reminder, that Jason and I will be actually doing a webinar on this subject on June 24th for compliance week. So if you want more information on that, you can contact us through info@archive360 and we'll get you the info on that, but I think it's going to be really interesting and we'll expand beyond some of the stuff we've talked about today. So with that, Jason, I very much appreciate the time and this was an exceptional podcast, very interesting stuff and I want to thank you.
Jason Bero:
Yes, thank you very much, it's been a pleasure Bill.
Bill Tolson:
Thank you.
Questions?
Have a question for one of our speakers? Post it here.