data storage and management in content creation and digital postproduction

It is the 31st of March 2021 - World Backup Day! Perfect opportunity for my first blog post.

Budgets are of course always tight. Filmmakers and content creators try to shave off costs wherever they can. Unfortunately, oftentimes people try to save money in the wrong places which could lead to potentially very costly data disasters later on. I hope with this article the readers gain some valuable information about data management. Solid data storage and data management can be implemented no matter if you work in a big post-production facility or on your own as a freelance Photographer/Videographer/DIT/artist/colourist. This article will be primarily about how to make data storage as reliable and fail-safe as possible. Other aspects like performance for example might be covered in future blog posts. I will also refrain from mentioning specific products or brands. What specific product works for me might not work for someone else. This is very much dependent on the type of work someone performs. I will concentrate on the underlying fundamental working principles of data storage in general which is important in order to see past the shiny product marketing and make qualified purchase decisions based on individual needs.

The data we talk about for this article would typically be created on set with digital still/film cameras and sound recorders. What I have seen very often is that the data gets transferred from the camera/recorder flash media to external hard drives via a notebook. The reason is obvious - camera flash media is small, both in size and capacity, and also rather expensive. So it makes sense to transfer the data from the camera/recorder media to a larger capacity hard drive which is also a lot cheaper in terms of GB per €/$ so that the flash media can be reused inside the camera/recorder for further shooting. External USB3 hard drives are especially appealing at first glance as they are most of the times a lot cheaper than just bulk SATA hard drives without any chassis and USB interface. Overall, external USB hard drives offer the best TB/€ ratio on the market - or do they? (spoiler alert: no they don’t, considering everything - keep reading).

Let us stop here for a moment and assess the current state the data is in for our little case study. Now the data is stored on some cheap, high-capacity external hard drive situated on set, probably attached to the laptop of the DIT/Data Wrangler/Photographer/Videographer. Is the data safe now? The answer is of course: No! Everybody knows that hard drives can fail and depend on what kind of media you are shooting you can lose a lot of value and work if one of these hard drives decides to fail.

“So what do we do?”

The typical solution is to use some form of a direct-attached external RAID storage device. Why is a RAID better in terms of data security/integrity than one single external Hard Drive? RAID stands for Redundant Array of Independent (or Inexpensive) Disks. The general idea is to build an array of disks that offers some form of physical disk redundancy (meaning 1 or more disks can fail and the data would theoretically still be intact). There are different so-called RAID levels but I will not be going into too much detail how they all work. For us, the useful RAID levels that are commonly used in commercially available external storage devices are RAID1, RAID5 and RAID6. In RAID1 there are 2 Disks and the data gets mirrored so that both disks contain identical data - in theory, you can take either of the disks and all data is there. This is a commonly used solution if the amount of data that is going to be created in a given project would fit on one single large Hard Drive - just add another in a RAID1 capable enclosure and you are done (or are you?). RAID5 works differently. In RAID5 each chunk of raw data (not a file but “chunk”, the RAID does not care or know about files, it organizes the data in chunks and stripes) is striped into n-1 data stripes plus one parity stripe (n being the total number of disks in the RAID array, stripes have all the same size). Each of those stripes then gets written to one disk each. The parity stripes are stored to the hard drives in a rotating manner - for each chunk of data the parity stripe gets written to a different disk than the parity stripe from the previously written data chunk. So each disk contains evenly distributed by the RAID5 algorithm both data and parity stripes. This does 2 things. 1: The sequential read and write speed of a RAID5 array is higher than for a single hard drive since the data chunks are striped across all the hard drives in the array (however this is mostly only true for large files - like camera original files that we mostly deal with here in this case study). 2: The Array can compensate for the loss of one physical hard drive since all the parity and data stripes from one lost hard drive can be reconstructed from the corresponding data and parity stripes of all the other still healthy hard drives in the RAID5 array.

“So… Cool! We just have to make sure to use a RAID1 or RAID5 to store all the important data and we are safe - right?!”

No!

Protecting against the loss of one physical hard drive in a RAID array will not secure your data from all the dangers that are out there. In fact, a RAID5 can only protect from this one scenario: The loss of one physical hard drive from the array. When that happens the data and parity stripes from one physical hard drive are missing but they get reconstructed on the fly from all the other data and parity stripes from all the other physical hard drives by the RAID controller - The array is now in a degraded state. It does not have any redundancy anymore. However, operating systems like Windows or macOS do not notice anything strange. For them the data is still there - that is because the RAID controller shows the OS only the logical layer of the Storage - The raw volume, which the OS will organize into one or more logical volumes and then format these with a file system on top like NTFS (Windows) or HFS+ (macOS) for example. The operating system does not know anything about the individual physical hard drives in the array or the data chunks or data and parity stripes or the fact that there is a degree of physical hard drive redundancy or that one hard drive may have failed. To restore the array to a healthy state the failed hard drive has to be replaced with a healthy one and a rebuild operation has to be started. The rebuild is carried out by the RAID controller and can be monitored within the operating system by logging into the GUI of the RAID controller or by command-line tools. It is also worth pointing out that it is a very good idea not just to slam in a new hard drive and start the rebuild - make sure to precheck the replacement hard drive with some tests so that you know it does not have any damaged sectors. But even if the new hard drive is perfectly fine - we have here already one major issue with RAID5.

During the rebuild, all hard drives in the array are being constantly accessed until the missing data and parity stripes got reconstructed onto the new empty replacement hard drive. This operation can take a lot of time (up to several days depending entirely on the size of the hard drives used) and puts a lot of stress on all hard drives in the array. Oftentimes it does not even matter if there is not much data stored on the logical volumes, the reconstruction operation does not care about used or unused logical storage space - it reconstructs the RAID on the block level underneath the logical level visible to the operating system. When at the same time the user wants to access data to keep working the stress the hard drives have to endure becomes even bigger and the rebuild time increases as well. This is basically the scenario where another physical hard drive failure has the highest probability! That process often can not and should not be interrupted and although you can continue to work on the storage while it is in a degraded state during a rebuild - it is not a particularly good idea to do so. If the individual hard drives that are used in the array are larger than 2TB in capacity the probability of a second physical drive failure during the rebuild process is actually considered to be too high and RAID5 can therefore not be considered safe anymore for mission-critical data at all. Also, a power loss during a rebuild operation can render the data on the whole array unusable (or at least very hard and very expensive to retrieve). That is why RAID5 basically should not be used anymore for the storage of mission-critical data. There is also one more issue. it is very possible that data corruption on the physical drive in one or more sectors can occur just like that without any apparent reason without the physical hard drive failing and therefore without the RAID controller noticing that something happened (bitrot for example). in this case, you could have a corrupt data stripe without the RAID controller knowing that data corruption has occurred. If you knew that this kind of corruption has occurred (highly unlikely, impossible to know), a manual rebuild could fix that by reconstructing the corrupt data stripes with the intact corresponding data and parity stripes from the other hard drives. But what if one parity stripe suddenly becomes corrupted? A rebuild operation would reconstruct the corresponding data stripes and propagate the error from just the bad parity stripe to the data stripes which were perfectly fine before the rebuild! Even worse: What if a parity stripe of one hard drive is corrupted, the RAID controller does not notice because the hard drive controller did not notice yet either and another hard drive fails? You would replace the failed hard drive and start the rebuild not knowing that there is more data corruption on another physical hard drive. Since there is an undetected corrupted parity stripe on another hard drive the rebuild process could encounter errors because the numbers do not add up when the controller executes the RAID reconstruction algorithm or you would end up with corrupted data after the rebuild completes since the RAID controller did not know that one parity stripe on a non failed hard drive was actually corrupted - maybe that leads to a file that won’t work - In the worst-case scenario however you could lose all data on all the volumes across the entire array - And no! I am not making that up! There are RAID controllers that just stop with an error message during a rebuild operation and your data is all gone.

“Ok. then let’s just use RAID6. With RAID6 two physical hard drives can fail without data loss. Now we are fine… right?!”

Spoiler Alert. No, not really!

With the n-2 data stripes and the 2 corresponding parity stripes per chunk of data (n being the number of hard drives in the RAID6 array), the RAID6 controller now has a chance to detect hidden data corruption and fix it on the fly - if one of the n stripes corresponding to one chunk of data is corrupted and all the other corresponding stripes (parity stripes and data stripes) still match in the checksum - the corrupted stripe can be identified and fixed. In other words, the controller now can figure out which hard drive “lies” during a rebuilding process or even during normal operation - however, during normal operation only if the particular RAID controller always reads data AND parity stripes while accessing data and also checks always if everything adds up correctly - something that is not possible with a classical RAID1 or RAID5, there the Controller just simply can not know which is the corrupted stripe and which is the correct stripe. That makes RAID6 much more robust during rebuild operations especially when the capacity of each physical disk is much higher than 2TB. Which is a good thing!

BUT:

Data corruption not only can happen on the sectors of the physical disks themselves. But only these kind of errors the RAID Controller could detect and potentially fix. The RAID controller combines several physical hard drives and stripes the data across them and adds different levels of physical hard drive redundancy by adding parity stripes and then presents the OS with one raw logical volume. The OS sees this raw volume as it would see the raw volume of any other brand new single, non-RAIDed Hard Drive. The OS can now create on top of this one or more logical volumes and file systems. And there, data corruption can happen as well! File systems like NTFS, HFS+, EXT4, for example, are fairly robust and offer repair tools in case the OS detects data corruption on the logical volume level. However, in the field, I have encountered rather expensive external Thunderbolt RAID enclosures with huge capacities to store original camera files of movies with production values in the millions of euros/dollars which were formatted using the exFAT file system. This is something you should never do! exFAT is a file system that was designed for digital still cameras and mp3 players but not for mission-critical data that contains potentially millions of euros/dollars of production value. With exFAT a loss of electrical power at an inopportune moment or something as simple as a cable that gets yanked off during a file copy operation can suffice to render the data permanently inaccessible - even when using a RAID6 with a physical disk redundancy level of 2 where all disks are perfectly healthy! Retrieving the data can be a very time consuming and expensive endeavour. If you use external RAID solutions that are directly attached to Computers (Direct Attached Storage or DAS solutions) use at least NTFS, or HFS+ or whatever the more robust file system option that is available to you on your system. One reason why this can happen is a RAID controller uses a write-back cache in the form of some RAM on the controller - typically 256 - 2048MB. All write operations will be absorbed by this RAM write-back cache first and only then sliced into stripes and committed to the hard drives. In the event of a power loss, all the contents that are still inside the cache are not yet on the disk. Since the RAM for the write-back cache is a volatile memory, its contents are lost in the event of a sudden power outage. simple file systems like exFAT can not deal very well with this and might get corrupted and unusable. Some more expensive RAID controllers had an optional Backup Battery Unit (BBU) which was there to keep the cache under power long enough so that all data could be committed to the drives in the event of an ungrateful system shutdown/power loss scenario - however, the BBUs also tended to fail after one or two years and it only powers the RAID controller - the actual hard drives will be offline when the controller tries to write-back the remaining data inside the cache. The next issue is the write-back cache itself - is it ECC memory (error correction code)? Data corruption can also occur on the data in the RAM so it would be advisable if the RAM also had the ability to detect and fix errors. With DAS RAID solutions a certain amount of write-back cache is managed by the OS or its file system. Normal Workstations rarely have ECC memory installed as it needs to be supported by the computers CPU and mainboard and it is typically a bit more expensive and a bit slower than regular RAM.

The more robust file systems have more features built in to make sure the data is kept consistent even in the event of a power loss or defective cables or unstable connections and other every-day inconveniences. They were designed with the ability in mind to be able to repair the file system in the event of a logical layer data corruption. But these repair tasks have to be started by the user. The OS does not always detect ongoing data corruption automatically. The OS can also not always fix all sorts of possible data corruption on the logical layer. Client OS file systems are general-purpose file systems. They offer a good compromise between processing overhead required, performance and data security/consistency. But there are a lot of better-suited options out there - usually just not included in client type operating systems. These options concentrate heavily on the data security and consistency aspect of the file system and have therefore more computational overhead and would not have the same performance compared to a DAS solution with the same RAID level and the same amount of hard drives - but that is fine since we are after security and reliability.

But in short: we know now that on external DAS RAID storage solutions data corruption can happen on the physical hard drives which might only be detected and fixed by the RAID controller and not by the OS of the computer - And there is data corruption on the logical level which only the OS might be able to detect and repair. We know also that for these kinds of logical layer data corruption it does not matter what kind of RAID level you use - the errors still will happen. In real life, one type of data corruption can also cause the other. A bad sector on a physical hard drive in a RAID array can propagate through and cause data corruption on the logical volume level which means to fix the issue it is necessary to fix the corruption with a replacement drive and subsequent rebuild on the RAID array of things and after that a file system check and fix on the OS level side of things. After this, your data has to be revalidated against the MHL checksum sidecar files that you hopefully have for your data.

And there is another big issue: The RAID controller itself is a part that can fail as well! If you are lucky and your external RAID storage solution is not too old the vendor might be able to supply you with a spare part in case your RAID controller fails. Importing array member hard drives to a new RAID controller is possible but it is not without problems - problems that could lead to the loss of all the data on the entire array. Retrieving data from former member hard drives of a RAID array is not impossible but time-consuming and therefore extremely expensive - expensive even if it turns out to be unsuccessful in the end!

And if this wasn’t enough already: Data corruption can develop while a hard drive or RAID enclosure is powered off, sitting on a shelf, just like that, without apparent reason…. A phenomenon called bitrot (No, I am not making that up!).

There are most likely other possibilities to mess up a DAS RAID solution but I think those that I outlined here are the most common ones.

“But what should I do then? It seems hopeless!”

One reasonable and cost-effective way of storing mission-critical data is to use a professional Network Attached Storage Solution (NAS).

“But why is that better? They also just use RAID and cost more money - right?!”

That statement is not entirely wrong. But the better NAS solutions on the market implement RAID in a much more sophisticated way than external Direct Attached Storage (DAS) solutions like external Thunderbolt RAID enclosures for example. The first thing is - modern NAS solutions do not have a dedicated RAID controller (which is a part that could fail), they rather implement the RAID stack in software that runs on the CPU of the NAS solution inside of its OS. The second difference is: NAS solutions use file systems specifically designed to store large amounts of mission-critical data and keep it consistent. These file systems, like BTRFS or ZFS, offer much more than the usual client type file systems you find in Windows or macOS and much more than just the logical layer side of things. BTRFS and ZFS handle the physical hard drives and health monitoring of the hard drives, they handle the RAID implementation, they handle the volume manager and the file system which are specifically designed with high reliability in mind. It all happens in the same software which is a highly reliable working principle. These highly sophisticated file systems are the reasons that dedicated RAID controllers are not as common today anymore as they were years ago. NAS solutions monitor constantly the health status of member drives - that goes so far that some vendors have online databases with health data from millions of drives from which accurate assumptions can be made when an individual drive in your NAS is most likely going to fail based on the health indicators each of your drives make available to the NAS Operating system - and the user is being warned before something bad happens via email. For example in my home lab, I use a NAS with an array of 8 drives. 1 drive is defined as a hot spare and not in use during normal operation. The other 7 drives are configured as RAID6.

“So why not keep an extra drive on the shelve as a cold spare and allowing for some extra capacity in the array that way? Having a hot spare in the system at all times seems like a waste?!”

Sophisticated NAS operating systems are designed to be able to predict when one of the drives is going to fail based on its S.M.A.R.T. data it exposes to the operating system. Also, the hot spare drive gets periodically checked for bad sectors - which would not happen if it sits on a shelf as a cold spare. As soon as one drive shows bad or unstable sectors all the still intact contents of this drive are copied over to the hot spare drive. After that, only the data which was on the bad sectors of the almost failing drive have to be reconstructed onto the hot spare drive - The RAID rebuild time is therefore super short and the stress on the drives in the array minimal. The array is basically never in a degraded, vulnerable state because the NAS takes action before a drive actually completely fails. This also minimizes the negative impact on the user who can continue to work normally during the whole time. The user just has to replace the damaged drive with a new drive when the hot spare drive has taken over the role of the damaged drive - the new replacement drive then becomes the new hot spare. These features are mostly not the default setting in such a NAS but have to be configured only once - preferably when the NAS is set up for the first time.

Modern file systems used in NAS devices like BTRFS and ZFS also offer many advantages and advanced features when it comes to detecting and fixing data corruption on the logical storage level side of things. When data is written to the volumes there are also checksums created - sometimes even 2 or 3 sets of checksums which get stored on different physical drives as metadata to increase fault tolerance. The NAS periodically reads (or “scrubs”) the contents of the volumes to verify if the calculated checksums from the data scrub process still match the checksums that have been stored when the files were created. If they do not match, data corruption has taken place at some point in time and the NAS system can now reconstruct the data using the physical disk redundancy of the RAID implementation. This is a bit like as if you have a set of original camera files and revalidate them after some time against an MHL checksum file that was created earlier to find out if the data is still consistent - only that the file system does that completely automatically with all the data on the NAS with no user intervention whatsoever and is on top of that even able to fix the errors.

There is no disconnect anymore between the RAID controller and the operating system. The NAS operating system has full control over the health monitoring, error detection and fixing of the individual drives and also the error detection and fixing on the logical data layer which makes everything much more secure, reliable and automated. Hidden data corruption that occurs spontaneously - like bitrot - can be detected and fixed by such a NAS device automatically.

NAS solutions also offer many more features which can be potentially useful in digital post-production environments. The most important of these features is that many computers can connect and work from the same NAS. There are countless more features a NAS has to offer that makes it a much higher value than a DAS solution but it would be completely beyond the scope of this blog article to describe them all. But a few extremely useful feature I will describe. NAS devices often have a feature called versioning that helps to backtrack changes to files - since versioning keeps a predefined number of versions of files when they get changed (that uses up more storage space though since at least the deltas from one version to the other have to be stored). This feature is especially valuable when you have the misfortune to have been attacked by ransomware. ransomware encrypts user data and a ransom has to be paid to get the decryption key. with versioning you can simply roll back the affected files to a version before the encryption took place - of course, the ransomware has to be removed from all affected systems in the network first for that to work.

“Brilliant! So instead of a Thunderbolt RAID, I just buy for a couple of extra bucks a suitable NAS device and NOW my data is completely and ultimately safe!”

Nope!

There are a few more so-called “single point of failures”. The power supply unit might fail on your NAS. The PSU is the second most probable part to fail after hard drives. Although the data in such an event is mostly intact, it means having some downtime and no access to your data while getting the PSU repaired or replaced (there are of course NAS Systems with redundant PSUs, redundant PSUs are mostly found in enterprise solutions). Did I mention that something could happen to the CPU or mainboard of the NAS system too? And: Yes! There are Enterprise NAS systems with dual controllers, where the term “controller” here refers to the hardware components of the NAS System like CPU, Mainboard, RAM etc. Both controllers are connected with the disks at all times and if the primary controller fails for some reason the secondary controller takes over with only unnoticeable service interruption. The more expensive NAS systems become, the more single point of failures are eliminated to achieve a very high level of data availability.

Not everyone can afford the most expensive dual controller, redundant PSU, NAS system and even if you can - despite all the advanced error detection and fixing features and the physical hard drive redundancy of 2 or even 3 drives plus one hot spare that a good NAS device can offer, you can still lose all of your data from one moment to the other! Without any chance of retrieving them! And I do not refer to accidentally deletion here - this is something that can be easily prevented by setting up a trash bin on the NAS device so that an accidental deletion does not immediately erase the data but just moves it to a trash bin first, from where it can be restored. I refer to theft or fire or other catastrophic events that would make your data forever inaccessible because your NAS would get destroyed or stolen in these kinds of events.

To combat these dangers you have to BACKUP your data. There is a very true proverb that says that a RAID is not a backup. There are T-Shirts with imprints that read: “No backup, no pity!”. You must have 2 or better 3 replicas for your most important data. This can be achieved by simply connecting 2 NAS systems over the Network (even over the internet) and have them synchronize their contents automatically. An easy way is to just use 2 identical devices and use built-in synchronization or replication tools to do the job. However, it is possible and sometimes preferable to have the data replicas on different devices from different vendors and connecting them with a platform-independent synchronization or replication tool. My secondary backup server is a DIY server at a different location from my main backup server - both connected securely and encrypted through open-source synchronisation software via the internet. That way in the rare event of a software bug that might emerge in the OS of one NAS device the other NAS is not affected. Ideally, the 3rd replica should be on another type of media than the first 2 replicas - to be even more safe let’s make this 3rd backup an offline backup that is not connected to the other replicas via synchronization but only gets updated periodically - for example, external hard drives or LTO tapes. This way errors or user mistakes that might propagate through synchronization will not immediately affect the 3rd offline backup. One of these 3 replicas should be offsite - at another location than the other 2 replicas. This principle is referred to as the 3-2-1 backup strategy - 3 replicas, using 2 different media types, with one replica being offsite. It is a very common standard - insurance companies will demand the implementation of a 3-2-1 backup strategy for movie productions. The 3rd replica can also be in some form of cloud storage service. These services also offer versioning and a trash bin system and nowadays even protection against ransomware encryption attacks as they are on their own highly redundant and secure - however, a cloud service can not be considered as an offline backup - as they are online by nature.

Another very important thing when setting up a data storage workflow with multiple backup replicas is that most of the tasks should happen automatically without user intervention in a perfectly reliable manner. This is how the additional costs of professional NAS solutions are recuperated later in the workflow. Data has to be stored reliably over years. If you use standard external commodity hard drives and create your data backups manually you will maybe save quite a bit of money in purchase costs early in the process. But when you factor in the work hours you would have to invest in manually create and periodically revalidate your backups every month (like a NAS System will do every month during night hours on its own) with an ever-growing set of data one or several NAS systems will quickly have more than compensated for their initial purchase costs and will free you up so that you can get more useful and profitable work done.

“But this is it? Isn’t it?”

Not quite. There is more of course.

Having physical storage devices with all that protection discussed above is nice but when your data storage needs to expand as your business grows every device hits a wall at some point. Many NAS devices have some form of expandability built-in. Some allow to populate the drive bays as you need - so you buy an 8bay unit for example but populate it with just 4 drives at first and more drives come in later. Sometimes the new hard drives do not need to have the same size as the drives already inside the NAS. Often there is also the possibility to attach expansion chassis to your NAS device - nothing else than drive cages without any controller unit that simply add drive bays to the main NAS unit.

Still, at some point in time, you need to add a new server. Then you end up with an old server and a new one - then there are 2 shares, 2 namespaces. new data tend to go mostly on the new server and this can lead to bottleneck situations. How to deal with that?

The magic word here is storage cluster and software defined storage. Storage clusters were until some time ago something exclusively found in enterprise data centres that deal with 10s or 100s of PB of data (for example CERN uses one to store the data that comes out of the detectors from the LHC during particle collisions). But now storage clusters can be very affordable for datasets as small as 10s of TB and can scale to literally unlimited capacity. The nice thing about storage clusters is that not only the capacity scales out but also the bandwidth which is very important as more and more clients are connected to your storage. Another nice benefit of a cluster of at least 3 servers is that the failure domain can be set to something else than just physical drive level. A storage cluster can in fact be set up that the loss of one entire node, rack or even geolocation (depending on the size of your datacentre) can be tolerated. Software updates to one server node at a time don’t cause any service interruption in your storage cluster either…. Storage clustering and how it can benefit smaller businesses is a huge topic in itself and will get its own blog article in the future.

One final thing remains: Always encrypt your datasets or volumes - be it on your NAS or on individual devices like phones or laptops. Use strong passwords and if possible additional methods of authentication (2-factor authentication or 2FA). There is no valid reason not to do encrypt! In case of a theft, you may lose a NAS system with hard drives or a laptop - but at least your data is not compromised and you can restore from your backups without having to worry about leaks on the internet.

And this is where I wrap up my first blog post on World Backup Day 2021. I plan more in the future about various topics in digital postproduction.

Zurück
Zurück

A colourists and display calibrators look at the iPad Pro M4 13” (2024 model)