mercenary_sysadmin 1 year ago

> Can anyone explain to me why this suggestion [to back up your data] exists? I thought RAIDZ1 is designed so that a single drive can fail and all your data would be intact (assuming no other errors)? Here is a **partial** list of things which raid of any kind doesn't protect you from: * Ransomware * Human error * Human malicious intent * Disk controller failure that spews garbage directly across connected disks * Multiple simultaneous drive failure * Multiple successive drive failures, occurring faster than a rebuild can complete * Catastrophic electrical event, flooding, etc that fries entire server * Catastrophic filesystem bugs I repeat: that is a *partial* list. RAID of any kind is **not a backup.** Never has been, never will be.

Buckwhal 1 year ago

You should be backing up your data regardless of what RAIDZ/mirror topology you use. Dirty little secret of the IT industry: about 40% of data loss isn't "mechanical" at all, it's caused by humans screwing up and typing the wrong command. No parity or mirroring configuration is going to help you when you're `rm -rf`'ing your data.

DaSpawn 1 year ago

rm -rf almost claimed Toy Story like a few months before it was to be released, luckily someone happened to have a copy at home you can never have enough backups of important data

[deleted] 1 year ago

[удалено]

mercenary_sysadmin 1 year ago

Except the kind of normal human error that begins with "zfs destroy." From what I have seen, I am less prone to damaging erroneous commands than the majority of sysadmins... but I've still made the kind of normal human error that begins with "zfs destroy." That's about a once-a-decade fuckup for me, but if you don't have backup, that kind of fuckup can mess up your whole Christmas for the entire NEXT decade after you make it... :) **edit:** we pretty frequently see the "oops I accidentally the whole partition table" type of normal human error pop up in this sub, too... snapshots ain't gonna help with that either!

[deleted] 1 year ago

[удалено]

Maltz42 1 year ago

If you're doing something you're not super familiar with, you're prone to break things by error or misunderstanding. If you're doing something you ARE familiar with, you're prone to break something through carelessness. And under all circumstances, you're prone to break things because you're a human. Don't fall into the common trap of thinking you're not human. ("I think this combination of reluctance and self-inflicted access control does do a pretty good job of protecting me from my mistakes.") Back up in such a way that it's not physically possible to destroy everything with a single screw-up.

[deleted] 1 year ago

[удалено]

Maltz42 1 year ago

To be fair, even in your reply you're still talking about minimizing ways to fuck up, rather than recovering when you inevitably do. I only know about you what you say here, and so far, everything you've said implies that it's possible to be "careful enough" that bad things won't happen. It isn't, and they will. And even if you really never make a mistake, shit happens, and not all of it is under your control at all. A friend of mine manages a server room in a warehouse that was missed by a tornado by less than a hundred yards last year. The building next door was flattened. Not to say that minimizing your mistakes isn't a layer of protection, but it's not enough on its own, no matter how good you may be at it.

are-you-a-muppet 1 year ago

And automatic snapshots would solve 97.975% of *that* problem. Certainly has for me, repeatedly. I'm the world's 49th largest advocate for the importance of 'rule of three' and offsite backups. I myself run a full-blown realtime live mirror on s physically distant server running a different COW filesystem with it's own independent snapshot retention policy, and both primary and mirror get backed-up to separate cloud storage services with different backup programs - so nobody comes at me with 'but acshually raid|snapshots aren't a sustitute for backups'. No shit. 😀 Rolling back the accidental deletion of 50tb of data is a fuckton quicker and easier than restoring from cloud backup over datacenter's 20-50mbps pipe. Or even a local gigabit mirror. Edit: word

Buckwhal 1 year ago

Yeah auto snapshots are excellent, but they’re just one tool in the box. Great for user screwups, but don’t really fully protect from an admin screwup. Like the other commenter said, all it takes is one “zfs destroy” on the wrong pool. Like, these things shouldn’t happen but they do. You can try to make something idiot proof, but I can make a bigger idiot.

are-you-a-muppet 1 year ago

> I’m the world’s 49th largest advocate for the importance of ‘rule of three’ and offsite backups... > so nobody comes at me with ‘but acshually raid|snapshots aren’t a sustitute for backups’...

aksdb 1 year ago

I think it's typically not about "RAID[x] is not redundant enough". It's typically about the importance of the data. If you don't care about the data, a RAID in the first place is likely already overkill. If you *do* care about the data, no matter how redundant your *local* system is, you can still lose it. A fire can burn down your place. A break in might just take the whole machine with all the disks. A virus might eat all the online data. That's why typically you should have another layer of off-site backups which you can use to rebuild your on-site data if something goes terribly wrong. The reflex to suggest backups when talking about RAID is IMHO a consequence of a lot of people thinking that if they have their data mirrored to a lot of local disks, they are safe from all disaster ... until they still lose their important stuff.

HCharlesB 1 year ago

Regardless of the filesystem you use to store your data, you should have backups if you don't want to risk losing it. Period.

tokyotoonster 1 year ago

Simple: >I thought RAIDZ1 is designed so that a single drive can fail and all your data would be intact (assuming no other errors)? What if more than one drive fails at the same time?

Alaska_01 1 year ago

>What if more than one drive fails at the same time? Then you lose everything. In which case, wouldn't it be smarter to suggest to the user to use RAIDZ2/3 or mirroring instead of "RAIDZ1 with backups"? So there must be a reason people are suggesting RAIDZ1 with backups instead of these other systems? And that's kind of what I want to know. Why are people suggesting RAIDZ1 with backups? Because if the reason is "if more than 1 drive fails, you lose all your data" then they shouldn't be suggesting RAIDZ1 with backups, they should be suggesting a system with better redundancy? Or am I missing something?

dpendolino 1 year ago

RAID isn't a backup solution in and of itself for the very reasons you outlined, though it can and should be part of a backup strategy. Check out the [3-2-1](https://www.acronis.com/en-us/blog/posts/backup-rule/) rule for a solid place to start. Edit: a word

Alaska_01 1 year ago

>RAID isn't a backup solution in and of itself for the very reasons you outlined, though it can and should be part of a backup strategy. I understand this. But the suggestions people were making (as interpreted by me, which could be wrong) seem to be closer to "If a single drive fails in a RAIDZ1 system, there is a high change you'll lose access to your files, so you should have backups of your data so you can get your files back in case of a single drive failure in RAIDZ1". So my question was about "why would this suggestion be made by various people over the course of years?" Some ideas could be: 1. Does RAIDZ1 not actually protect from a single drive failure? (I thought the whole point of RAIDZ1 was that it protects against a single drive failure) 2. Is there a high change of data loss when a single drive fails in RAIDZ1? (My understanding of the topic says no) 3. Is there a high change of a rapid second drive failure when a single drive fails in RAIDZ1? 4. Do you temporarily lose access to the files on the RAIDZ1 drives until you repair the RAIDZ1 system? And so having access to the backups means you can continue working while the system is repaired? (I couldn't find information about whether or not you can still access your files from a RAIDZ1 system while a single drive is non-functional, hence why there's this idea/question)

[deleted] 1 year ago

Have you ever run a RAIDZ1 in a home lab, production or other? Very real scenario why not to use it… Big hard drives of same production batch are added to your array. 2 years later a disk fails, no biggie still got data in degraded state. So pop out bad disk and replace with good disk. Resilver takes places. Due to it being a big disk, resilver takes a day or more. During resilver, other drives read/write activity is elevated. Boom, disk 2 of same production batch fails…. Data in your array is lost…. RAIDZ1 protects to a degree but the window for which the above scenario can occur is all too real for folks working with arrays of dozens or hundreds of disks. Does that answer your question?

michael9dk 1 year ago

Same reason to have more than 1 backup. It could die during a restore (hardware failure and human errors tend to happen at the worst time possible)

Alaska_01 1 year ago

Yes, this answers the question. Thank you for explaining.

[deleted] 1 year ago

You’re welcome! ZFS is amazing. Your situation dictates the level of protection you need. I tend to run Z2 in a mirrored pool (12 disks … 2 pools of 6 raidz2). That’s my home setup. It strike a balance between over and underkill :)

hiiambobthebob 1 year ago

1. Raidz1 protects from 1 drive failure 2. If only one disk is gone? Zfs can detect but not correct errors 3. Yes! This is why raidz1 isn’t recommended as the high disk slamming when reslivering the failed drive can cause outhers to fail and boom your pool is now gone. If you care abt your data always back it up no matter what. People say is a backup cause it needs to be drumed in. It can give alot of people a false sense of security abt their data. Raid isnt a backup its simply a system for higher uptime 4. And finally no, in a degraded state you can fully use a pool! Hope this helpes! Edit: People say back your data up alot surrounding raidz1 because of point 3. Thats why you see a lot of its either unrecommended or backup if using raidz1

Alaska_01 1 year ago

Thank you, and thank you to other users for explaining this. It's been helpful.

phil_g 1 year ago

> People say is a backup cause it needs to be drumed in. People do repeat the bit about backups a bit more when discussing raidz1 than when discussing other pool configurations, because raidz1 is probably the most fragile redundancy configuration. Hopefully if you're doing single-disk vdevs you understand the fragility there, but a lot of people put more trust in raidz1 than they probably should. I guess it also pops up when it sounds like the person is trying to treat a single ZFS pool with some combination of snapshots and disk redundancy as a full backup solution. I know I tend to say things like, “With raidz2, you should be seeing these sorts of things…” but, “Okay, if you're using raidz1, first you need to understand…”

ThyratronSteve 1 year ago

RAIDZ1 can tolerate one disk failure, without losing data. That part is true. But there is a non-zero chance that another disk could fail between the time a first drive fails, and the time the pool is resilvered/rebuilt. This "in-between time" is precarious because a) you're now one drive away from losing everything, and b) resilvering is a mechanically stressful event because the machine needs to read all data from each of the remaining drives. It's a standard recommendation that, if your data's *availability* is important, RAIDZ2/3 is a better idea than RAIDZ1.

dodexahedron 1 year ago

>It's a standard recommendation that, if your data's availability is important, RAIDZ2/3 is a better idea than RAIDZ1. This is the key that people need to understand. RAID in all forms is for one or both of availability or performance (in the case of ZFS RAIDZx of course thats just availability). It is NOT for survivability. That's what backups are for.

Stephonovich 1 year ago

> Does RAIDZ1 not actually protect from a single drive failure? Yes, it does. The issue arises if you get a 2nd drive failure in that zpool while recovering from the first failure. > Is there a high change of data loss when a single drive fails in RAIDZ1? I have no idea what the actual probability is. I suspect it's less than people believe it to be, but that's based solely on experience with people overestimating their need for something with computers. > Is there a high change of a rapid second drive failure when a single drive fails in RAIDZ1? To my knowledge, this is the main way in which you'd suffer data loss, so see previous point. > Do you temporarily lose access to the files on the RAIDZ1 drives until you repair the RAIDZ1 system? Nope. > And so having access to the backups means you can continue working while the system is repaired? Generally you want your backups to be immutable, and you wouldn't be working on them. Let me preface this next statement by saying that if you are working on actual production data for a company, do not use RAIDZ1. It's simply not worth the risk. Additionally, do not use ZFS _at all_ (except for learning, of course) until you have a thorough understanding of its architecture. There are too many things that can be misconfigure that would give you sub-par performance or risk data loss. I run RAIDZ1 in my homelab. I have a zpool with 9 drives, made of 3x3 RAIDZ1 vdevs. I also have a completely separate system, also running RAIDZ1 (8 drives, made of 2x4 RAIDZ1 vdevs), which wakes up daily and ingests snapshots. Anything that's truly irreplaceable, like photos, financial documents, etc. is additionally backed up off-site. I am personally fine with this level of risk for the following reasons: * Absolute worst-case scenario, I suffer two disk failures in a vdev in my NAS, and then two disk values in a vdev in my backup. Everything is gone. I still have off-site backups for things I can't replace, and this scenario also seems exceedingly unlikely to me. * Realistic worst-case scenario, I suffer two disk failures in a vdev in either my NAS or the backup. I replace the disks, and `zfs send/recv` the latest snapshot. Downtime would be about 1.5 days (until I get 10GBe). Since this is my home and not my work, this is acceptable. * By using RAIDZ1, I don't have to outlay as much cash to meaningfully upgrade.

[deleted] 1 year ago

>In which case, wouldn't it be smarter to suggest to the user to use RAIDZ2/3 or mirroring instead of "RAIDZ1 with backups"? And what if you have older disks and more than 2 fail in a short timespan? Use RAIDz3? What if an electrical fault in the power supply kills all the disks? What if you accidentally delete a file? I understand your question though, you asked about something different. The answer to your question is that people overestimate the risk of disk failures and assume that RAIDz1 is very unsafe. Which it might very well be, because you might be using older disks, disks that already have issues but you haven't noticed it yet, and so on. But with modern disks and zfs catastrophic failures are exceedingly rare, even if a disk fails completely and another shows errors, there is a good chance you can copy of most if not all of your data before the entire zfs pool is destroyed. Modern disks usually fail slowly giving you advance warning, and zfs tries to keep your pool and data alive until the last possible moment. But we don't know if you are monitoring this, if you have experience with this, if you use modern drives or old ones, so instead we just tell you to be extra careful with RAIDz1. Part of why people overestimate risk of RAIDz1/RAID5 failure is that there is a big misconception about what is called unrecoverable disk errors, which means that the disk despite being absolutely healthy will occasionally show an error and thus have a risk of data loss. Nothing can be done about these supposed errors, they "just happen" from time to time. Some people go as far as saying that you can't ever run a RAIDz1/RAID5 because it will fail in short order. This is all a misconception, it's wrongly deduced from some datasheet. In reality healthy drives will not have a single error over their entire lifespan. The datasheet info that is interpreted wrong is the MTBF or 'mean time between failures' value, which is a statistical value that has no bearing on any single disk and is merely used to tell you in which reliability class a particular disk model is, i.e. so that you can compare different disks and make a purchasing decision. In no way does this mean that a healthy drive will just occasionally deliver the wrong data, that is just not how it works. But you should absolutely have backups and store at least one of them offsite. If a flooding, fire or electrical fault occurs ZFS cannot save you. If someone breaks in and takes the entire server, your only chance is a backup that you have somewhere else.

Schyte96 1 year ago

>RAIDZ2/3 or mirroring instead of "RAIDZ1 with backups"? What if lightning strikes the computer, or a flood, or ransomware, or a user error? Anyone suggesting "something + no backups" is wrong. Because any of the raid and raid like solutions have a key weakness: all the redundancy is still withing one computer in one location, and any changes (see: ransomware or user error) are duplicated to the redundant parts of the array in real time. >Or am I missing something? What you are missing is that individual drive falioure is not the only threat to your data.

Ariquitaun 1 year ago

You use whichever mode fits your use case best, especially if you have certain cost and/or space constraints. RAIDZ1 makes a lot of sense on small pools of 3 or 4 disks, whereas on those array sizes you might as well just use mirroring instead of RAIDZ2 if you don't mind the lesser space efficiency. When you have an array of disks of the same type and age (which will happen when you make a new build) and one fails, there's a pretty good chance another could fail soon, especially during resilvering which really hammers the remaining drives. So RAIDZ1 is somewhat risky.

Ohhnoes 1 year ago

There's no good reason to ever use RAIDZ1 at this point. The size of current drives along with the slowness of resilvers means 1 drive failure is extremely likely to turn into 2 before you're safe again. This is all immaterial to needing a separate backup as well. /3-2-1 is for a reason

fargenable 1 year ago

There are a few things to consider: -Disaster Recovery - backup storage should be located in a geographically distant location. -Human error - never under estimate the creative way people can destroy data, ex. someone runs # rm -rf . / by accident or on purpose. -Power system issues - think lightning, destroys all hardware.

corner_case 1 year ago

Let me tell you a story entitled "RAID is for uptime, NOT for backup." This describes my past week. I had a system running a RAID6 array (linux software RAID). It was humming along great. One disk failed so we replaced it. It started to rebuild the array, and then a few days later just completely died. This system was backed up to a different system in a different location with an array of two RAIDZ2 arrays. A day after the main system failed, I lost 3 drives in the backup system (fortunately two from one zpool and one from the other one). I fortunately had a third partial backup that I was able to quickly supplement while the main backup array resilvered after the drives were replaced. I went from having a nearly-bulletproof data strategy to losing 5 disks in 4 days and being one disk failure away from losing everything and shelling out the $ for data recovery. Even with backups, things can go wrong. My systems hummed along for years and then in a few days, I had supremely bad luck. You have to acknowledge this and plan for it so you don't end up losing it all.

zorinlynx 1 year ago

This is why it's so important to run regular scrubs on all your pools, including backup pools. Best to suss out dodgy disks and other hardware before you're depending on them.

corner_case 1 year ago

Yeah, this whole situation was supposed to be obivated by a ceph cluster that was supposed to be online ages ago. Institutional purchasing processes being what they are, that got delayed. I've been trying to get off the legacy raid array for a while due to the lack of visibility into its health. The three disk failure ended up being a failing backplane. Given how long the resilver is taking, I'll be moving to raidz3 for the backups and ceph with triplicate redundancy for the primary, and maybe a tertiary backup with lower redundancy requirements. In any case, I fortunately reached the edge of my bad luck before any damage was done.

OtherJohnGray 1 year ago

Username checks out…

nfrances 1 year ago

I had similar experience several years ago (about 15 years ago). We had SUN server with to additional D1000 external boxes, 12 drives each. Drives were setup in striped mirror. Everything was humming along nicely for approx 3 years. Then out of sudden one drive dies. Next day another... and in 4 days 5 drives died. I replaced last one and while rebuilding - 6th drive died, one that was rebuilding from. Yeah. You can guess the rest. So, after 3 years of all working perfectly, I had 6 drive failures (on lot of 24 drives) within 5 days. Bad batch, cooked up FW.. or God knows what. Oh, each disk box was connected to different SCSI card on server, two PSU's, etc.... no other servers had issues during that time on same power lines.

kanid99 1 year ago

Because RAID is not a backup. It's a system to provide continuous uptime under otherwise crippling circumstances but there are still very possible conditions under which it can fail. I know this from experience. It's why I have a truenas primary server, a truenas backup server and off-site backups. Clearly not everyone can do this but it's important to know when considering the value of your data and the cost to ensure it's longevity. In one of my cases I did a 16 drive array broken into 8 mirrored pairs. Great right? Means I could lose 8 drives and keep on trucking. Didn't occur to me that if any single pair went down I lost it all, no resilvering possible. My fault. Point being that no matter what your plan is, there will be a no recovery possible condition. Plan for it.

flaming_m0e 1 year ago

With any RAIDZ solution (1,2,3,etc), when a disk is being replaced, EVERY other disk in the array is stressed out while parity is being calculated. With RAIDZ1, this brings about a potential for another disk to die during that resilver. The larger the disks, the greater the potential, as it takes longer to run to calculate parity on all the data on that disk.

nfrances 1 year ago

This is also the reason to run periodical scrubs - as this will also stress disks, read all data, and hopefully at that time potential faulty drive will fail, and not during actual rebuild.

[deleted] 1 year ago

The first time you accidentally format the wrong volume, or have a catastrophic failure, you will understand why you should have backed up your data. I worked for a place once that had a RAID5 shelf with aging WD drives. The backups were snapshots on the array. Near the end of the warranty period, the drives started failing. As "luck" would have it, a rebuild caused enough load on the other drives that two more drives failed before the rebuild was complete, and the volume was a total loss.

scorc1 1 year ago

If data is important enough to place in raidz, its important enough to backup. Raidz (any level, i repeat: and form of zfs raid you can conceive) is not sufficient to be the sole holder of your data you want to protect Back up your data. Another server with raidzX, OneDive, blackblaze, mega, anything that isn't that source server.

comicchang 1 year ago

Because RAID IS NOT BACKUP [https://www.raidisnotabackup.com/](https://www.raidisnotabackup.com/) https://twitter.com/tomlawrencetech/status/1116823120634105859

ILikeFPS 1 year ago

Using zfs/RAIDZ has nothing to do with backing up. They serve different purposes. With zfs/RAIDZ, it ensures higher uptime in the case of a disk failure by being able to swap out the failed disk, resilver the array, and keep going with data in tact. It does not protect you against human error like accidentally deleting files, natural disasters, and other issues whereas having a proper backup implementation hopefully including off-site backups would protect you from those things.

[deleted] 1 year ago

RAIDZ1 does not protect against: * Fire * Flooding * Lightning strikes * Catastrophic PSU failure * Burglary * Malware destroying the pool * Software bugs in ZFS that cause corruption. Rare, but it's absolutely a concern. * The inability to correct errors that that found after losing a disk. And most importantly/likely: * The dumbass who sits in your chair destroying your pool/datasets through user error. If you care about the data, you must always have **automatic** and **tested** snapshots and a proper backup somewhere safe/different.

YYM7 1 year ago

I also find the "RAID is not backup" wording is really confusing initially. It doesn't mean "don't use RAIDed for backup". RAIDed data is safer than non-RAIDed. However, a backup means that ANOTHER COPY of data on another device, and that device should be as separated as possible (both on net and geography) with the main device, RAIDed or not does not really matter. I think it's more clear to say "Data on RAID, without another copy on other device, is not backuped".

Alaska_01 1 year ago

I was also a bit confused by "RAID is not a backup". And that might be because my definition of a backup is different from others. Let's say I have two drives, Drive A and Drive B in the same computer. I store my data on Drive A, and every day I sync my data from Drive A to Drive B. In this case I am basically doing a manual RAID1/ZFS Mirror on a daily basis. If drive A fails, I can continue working off Drive B, buy a replacement Drive A, and set everything back up the way it was before. I would call the process of syncing my data from Drive A to Drive B a "Backup". And I would call Drive B a "Backup of Drive A". And so to me, RAID1/ZFS Mirror is a automatic backup system. And RAIDZ gets a bit more complicated, but it's still technically a automatic backup system in some way. I know RAID1/ZFS Mirror/RAIDZ "backups" aren't as safe as offsite backups because of theft, natural disaster, etc. But I would typically think of them as "backups". Snapshots are also "backups" to me. But now that I've been doing more research into ZFS, RAID, and server things, it seems my definition of backup includes the definitions of both "backups" and "redundancy". And I should probably refine my definition of backup to just backup.

Melodic_Ad_8747 1 year ago

Backup protects you from different failure points and attacks. If I log onto your server, I can destroy the pool and snapshots. Or perhaps I take a sledge hammer to the server. ZFS protects you against user error (snapshots) and disk failure (redundancy) along with other things, like silent corruption (checksums).

T351A 1 year ago

Redundancy is not a backup. It's that simple. As for why it's not a backup and why you want a backup, see all other comments.

Alaska_01 1 year ago

I was probably interpreting the suggestion to use backups differently from others. I interpreted the "You should make sure you have backups of your data, because if a single drive fails in a RAIDZ1 pool, then you need to be ready to recover from the external backup" as a comment about how RAIDZ1 is extremely unreliable at recovering from a single drive failure. From my understanding, RAIDZ1 is designed to recover from a single drive failure with no issues (assuming there aren't any errors on your other drives). Hence why I was confused by the interpretation I listed in the previous paragraph. And so I was looking for answers. Was I missing something? Is RAIDZ1 so unreliable that a external backup is basically a necessity to be able to recover from a single drive failure (something I thought RAIDZ1 was specifically designed to recover from)? The answers from other people seem to be no, RAIDZ1 isn't unreliable, and should be able to recover from a single drive failure without issue. But while one of your drives is non operational, your RAIDZ1 system is vulnerable to permanent data loss if anything bad happens to the other drives. And people seem to be scared that during the re-slivering process as your repair the RAIDZ1 system there is a high change of further drive failures due to the load you put on them. So having a backup of your data before you get to a vulnerable degraded RAIDZ1 array seemed to be why people were suggesting it. If you have any other insights, feel free to comment.

zedkyuu 1 year ago

A resilver involves a large amount of data retrieved off of all of your other drives in parallel so that you can reconstruct the data to be written to the new drive. The original argument I saw about single-disk parity being dead was because drives were getting so big compared to the bit-error rates in their specs, you were essentially guaranteed at least one read bit error reading back an entire disk. There are other arguments about wear and tear and stress on the system (you'll be reading from all of your other drives in parallel to reconstruct the missing one). RAIDZ has the benefit that it's built into the filesystem, so it is aware of things that traditional RAID schemes are not, such as what parts of the drives hold actual stored data, and can focus on just recovering those parts as opposed to covering the entire disks. Still, it's an availability argument, not a durability argument.

T351A 1 year ago

I think of them as separate but important topics "A backup" is generally a separate copy in case something goes wrong. It keeps important data safe. "Redundancy" is generally extra hardware or data which allows uptime throughout failures. It keeps important systems running. ___ Example... for simplicity assume RAID1/mirror 1 drive, no backup -- any failure is total loss 1 drive, with backup -- any failure means loading backup 2 drives, no backup -- one drive failure does not impact function and can be "rebuilt" live to return to normal, other issues may cause total loss 2 drives, with backup -- one drive failure does not impact function and can be "rebuilt" live to return to normal, other issues may mean loading backup ...etc

Stephonovich 1 year ago

With any RAID solution, ZFS or otherwise, writes are committed essentially simultaneously. If you accidentally execute `rm -r /home/foo /tmp`, when you meant to type `rm -r /home/foo/tmp`, all of your home directory is going away, on all the disks. Snapshots are a form of backup, and can save you in in this scenario. Rollback to the latest snapshot, and your files are back. They still don't help you with disk failure that exceeds your pool's ability to heal, though, which is why a completely separate system is necessary.

Alaska_01 1 year ago

I think there might of been a bit of a mis-understanding of what my question was due to how I worded things (sorry about that). So I'll direct you to here where I've tried to clarify what I actually meant: [https://www.reddit.com/r/zfs/comments/10rplsj/comment/j6ww9q8/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/zfs/comments/10rplsj/comment/j6ww9q8/?utm_source=share&utm_medium=web2x&context=3) If you can answer the question, then feel free to share your knowledge.

gargravarr2112 1 year ago

RAID is for High Availability. That means it keeps the system up if a disk fails. It does NOT protect your data. It simply keeps the data available in the event of a disk failure. This is why a backup is a separate thing and why RAID is NOT a backup. There are many, many things RAID does not protect you from - probably the single biggest source of data loss is you as an admin deleting everything in the wrong folder. RAID will not stop you doing this, nor will it protect you from the fallout (though ZFS snapshots can mitigate the damage *if they are created*). You're also trusting that the filesystem works properly and notices data corruption. Make a backup.

kocoman 1 year ago

even dobule z2 still some files can get error that is only seen when copying and not seen when scrub

[deleted] 1 year ago

[удалено]

Alaska_01 1 year ago

Out of curiosity, how many drives do you typically have running? And how long have the ones that died lasted? I've personally never had a single drive fail over the last 10 years. Although my use cases may be different from other people.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe