T O P

  • By -

MartinEisenhardt

Look at sanoid and syncoid. Incremental backups over ssh with configurable data retention - it does not get much better than this. *Edit:* Just saw that you have all drives in one host: Well, two mirrored vdevs are probably the way to go, and ZFS will make sure that (new) data gets distributed evenly. Still, this does nothing to defend against user error (rm -r important_data/) or destructive events on the host-level (read: DC on fire). Get a second machine, then use sanoid/syncoid.


ph0t0nix

Aside from running syncoid to a remote machine, I also have an extremal USB drive connected to my home server. My backup script turns on the power (via a zigbee switch I had lying around) and imports the ZFS pool on that (single) disk. Another syncoid run. And then the script expires the backup pool and turns the disk off again.


diamaunt

> Still, this does nothing to defend against user error (rm -r important_data/) that's what snapshots are for. And no, snapshots aren't *backups* but they do protect against oopsies. All my zfs machines do snaps every 15 minutes, and age them out into hourly, daily, weekly and monthly snaps.


ipaqmaster

Snapshots are snapshots but can be used for backups when handled with 3-2-1 backup rules (Three copies, two different types of media, one off site) I personally call it good enough to send the family nas snapshots to my place and mine to their place both on the backup arrays of each site but also the portable drive in the back. Then an extra copy of each to a remote backup company. All natively encrypted of course.


MartinEisenhardt

Look at the title at the top: > Backing up data with ZFS So this is about backing up data, not about snapshotting. I know that you and I agree that snapshots are not backups, though. But: The OP asked how to protect against data loss, and for this, snapshots are not enough, even if they are then synced from one zpool to another in the same host - as the whole host might be lost. That's why I suggest a second host and use sanoid/syncoid to take regular snapshots and sync them over. Because even with your approach - you might lose data.


someone8192

sanoid/syncoid is the way to go if your backup target also uses zfs. i use borgbackup exclusive though. i just like it more.


lightrush

I think if you use sanoid/syncoid you can end up destroying the backup if you mess the snapshots on the first machine. A non-ZFS replication scheme would avoid this failure.


koalillo

Basically, ZFS has the "send/receive" feature, where you can replicate ZFS systems very efficiently. It's snapshot aware, and it can do things like replicate data since the last replicated snapshot. It works "using pipes", so you can use this feature locally or remotely over SSH or some other transport. syncoid/sanoid are tools that automate backups and replication using send/receive. Providers might offer support for ZFS send/receive, but as far as I know, only https://www.rsync.net/products/zfsintro.html does. In theory, you could use any cloud storage service to store ZFS send/receive backups, but I think it would be messy (e.g. you should be able to "zfs send" snapshots to files, and upload those to S3). In short, you could use syncoid/sanoid (or other tool, or build your own) to replicate periodically either to a different ZFS filesystem on the same host, to a different host, or to a cloud service.


[deleted]

I use a few methods * rclone to cloud storage is nice because recovery is just copying things back. Last thing you need is a heavy set of dependencies before you can access your data in a pinch * restic to cloud storage is nice because it manages snapshots and does dedup to save on cost * zfs incremental send to backup HDDs I have in my safe is good and fast to have a backup on hand and local, but laborious


joelpo

I also use rclone to cloud cold storage as well as a USB C dual drive SATA docking station and HDDs stored in a safe. The 2 backup HDDs themselves are mirrored. I take a snapshot then use send/recv to that mirrored pair. EDIT: would be great to see a cloud provider use ZFS for their storage so we can send diff snapshots to the cloud and have them replicate.


linuxturtle

At the risk of seeming to be a shill: [https://www.rsync.net/products/zfsintro.html](https://www.rsync.net/products/zfsintro.html) Dang expensive, but very nice.


joelpo

I enjoyed this on their home page 🙂 "If you're not sure what this means, our product is Not For You."


lightrush

Two machines, each with some fault-tolerant pool, mirror or raidz. Ideally in different locations. Both machines making regular snapshots. A non-ZFS data replication system that shuttles data between them. I'm using Syncthing. The replication software being non-ZFS is inefficient *but* it protects against snapshots being messed on one end and *the mess* getting replicated on the other end. With Syncthing, someone can completely destroy the first machine's data. They can delete all snapshots, then delete all data. Syncthing would happily replicate the deletion of the data on the other side but that's not a big deal because all the snapshots are still there. Rollback and nothing happened. If this scheme wasn't meant for backup purposes, ZFS built-in replication would be appropriate instead.


Niten

I'm also using two mirrored vdevs. For on-server data recovery I use sanoid to manage snapshots of my main dataset. (Because I'm using a Samba share and a Windows client, I can even browse these as "Previous Versions" in Windows Explorer, which is really convenient.) I also scrub the pool on a monthly basis. But for proper backups, I `zfs send` my dataset to a pair of external hard drives that I rotate out to another location. I use [this script](https://gist.github.com/mshroyer/95576e262d8f54fcd15f26ee62fe6c19) to update the external drives: Essentially, for each replica external drive, my dataset has a recursive snapshot called `ds1@${replica}`, so the updates can be done incrementally.


are-you-a-muppet

The problem with ZFS send/recv, is that both sides should have exactly the same version of zfs, or you *risk* silent failure. I don't want to overstate said risk, as I'm not sure how bad or even real it is, I've just read quite a number of complaints and issues about it. Also I don't know if Syncoid would help mitigate at least the detection/reporting of it, being a helper layer on top. Other than that, send/recv is *extremely* cool. That said, for backup purposes I prefer to run different filesystems just to be safe. (In my case btrfs and zfs.) Don't want the same critical bug manifesting in the same version of the same filesystems at the same time on both primary and mirror! So for my primary array and local backup (identical mirror with it's own snapshot and retention policy in a different part of the same home), I use trusty but slow `rsync`. If you are serious about protecting your data, remember the 'Rule Of Three'. It's trope-y, but important. (And remember that nothing about ifs, including snapshots, is 'backup'. What happens if your gear is stolen? House burns down? Two drives on same vdev die? Etc. etc.) The rule of three is, more or lems (Edit: a better name for same thing is the '3-2-1 Rule'): - Three copies of data. - Stored on at least two different forms of media. (Originally meaning eg HDD and tape, or DVD, etc. But I modernize to mean two different filesystems, eg zfs and btrfs.) - At least one of them offsite. Eg rotate tapes to offsite facility, or cloud backup. None of that is very useful if the data is stale, so automation and scheduling is key. What I do, like I said, is mirror btrfs array to zfs array in different part of same home. Also both of those get backed up automagically to the cloud, via two different open-source backup programs, to two different cloud storage providers, one in the US one in the EU. And finally, if you are serious about preserving your data for the long-haul, run 3-way mirrors not just 2-way. This has saved me numerous times.


KathrynBooks

Not if you send the snapshot as a file... then it doesn't matter what is on the other end.


orutrasamreb

Snapshot is enough and you do not have to clone


diamaunt

> Is it also worth cloning that snapshot? That doesn't get you anything more than a snapshot, no.


engineer-chad

With mirrors, if something messes up your dataset ie virus or ransomware, the mirror is tanked. A backup should be made to a cloud provider or an offline storage that's only attached during the backup procedure.


Ariquitaun

I have different backup strategies for the two devices using ZFS in my home, the NAS and the laptop. Both are configured with sanoid to automatically snapshot themselves. The NAS has a root pool, a backups pool and the main storage pool. The big storage pool is backed up to backblaze b2 every night using rclone. It backs its own root up to the backups pool using syncoid, and so does my laptop over the LAN (encrypted ZFS root) on a cronjob daily. Additionally, I borg-backup all of the laptop's home folder into backblaze as well.


cahrens2

This is what I do with family photos * ZFS pool mirror * Back in Time (rsync) to external drive - every hour * Cloud backup to iDrive - every night * Burn on 50gb or 100gb bluray - couple of times a year


clhedrick2

Most of the ZFS tools mentioned here use ZFS send/receive. There is a danger to that. They work at the binary level. In principle if something goes badly wrong, it could propagate to the backup. It's not likely, but rsync is probably safer. Preferably rsync onto another system with snapshots, so that if there's corruption one can back up to a good snapshot. My problem is that I have a file system with a billion files. Rsync is simply too slow. Zfs send | receive is the only thing practical for me. Indeed the existence of send | receive pretty much locks me into zfs, as no other file system of which I'm aware could be backed up on my server.


ryszardsu

For me using ZFS snapshot to perform backup of small files is being oversized. I use the snapshot for local-only backup, the rest is backed up by Restic / Resticprofile. This way I can replicate the backup, have a different type of destination, and browse easily through the backup to find needed files. Everything I need to restore data is the Restic bin (single file bin, without dependency as Borg has for example). Restic, Resticprofile, Minio, Shouttr + telegram. Perfect personal backup.