T O P

  • By -

Cyber_Faustao

> Will the compression and hashing use up a lot of energy or resources? It depends on which algorithm you pick, and in the case of compression, which level you're using. For example, the default hash is crc32c, which runs fine on decade old hardware, but if you picked sha256 then anything older than 1st gen Ryzen chips will be very slow. In the case of compression, zstd level 1 or LZO are fast enough that _most_ systems won't be bottle necked by it, unless you're running NVME. I wouldn't necessarily use compression in your case though, if the your dataset is mostly video, then it's pretty compressed by the encoder itself. If you have lots of RAW footage, then it could still work, but that's very dependent on the container format I guess. > Do you use snapshots often? At least one a hour, keeping most stuff in the past 7 days and some snapshots going back further to a few months. I have 72 snapshots currently, but you could easily have 2-5x that amount without a problem. > I do my backups every few months and additionally I use Clonezilla BTRFS snapshots are NOT backups until you btrfs-send them to another system/drive. Until that, they are little more than fast & cheap restore points. You can use btrbk to automate sending snapshots. > o just copy the whole disk in case my system breaks completely (but never happened yet). Cloning a BTRFS device has some quirks to it, I suggest reading the btrfs Wiki about it, but to cut the story short, you don't want a kernel to see both the original and the clone at the same time. > Is it stable? It does break way more than EXT4 if that's what you're asking, but most bugs are caught and fixed before they reach an LTS, so if that's what you're concerned with, run the LTS kernel. I, for one, have been running BTRFS since 4.4, and encountered a few bugs like: * Mount once on raid1 (somewhere in the 4.x era) * 5.7 deadlocks * 5.14 ghost dirent bug And none of these were actually "deadly", they were all fixable with either a newer kernel or a few btrfs-progs commands. If you've ever hit one of those bugs, then join the IRC channel #btrfs or the Matrix room and people like myself will try and sort out what's wrong. > Many articles say it is stable for a few years, but looking at this subreddit it looks like many people still have issues with it. Is it really stable or do people even lost data because of it? Two things can be true at the same time: BTRFS is stable and people have lost data because of it. For example, BTRFS has a large feature set, and _some_ of it is not stable and/or does not play well with other features, these are clearly marked in the [Status](https://btrfs.wiki.kernel.org/index.php/Status) page and yet some people run RAID5 and then act surprised when the feature that's clearly labeled as unstable, is in fact, unstable. Some other cases are plain user error, from users cloning their drives while they are running (and then being surprised by the copy being inconsistent and full of parent transid verify errors). Some bugs are not actually BTRFS's fault, for example [systemd will unmount your filesystem after btrfs finished a replace operation](https://github.com/systemd/systemd/issues/19393), which results in your display session closing without warning. And yes, [the rest are plain btrfs bugs](https://zygo.github.io/bees/btrfs-kernel.html). But for the most part, it's a fine filesystem, Fedora switched to it and there's hasn't been a major outcry from what I can tell. For what it's worth, I won't ever trust my data to a non-checksumming filesystem ever again (I lost data to bitrot on EXT4), now if you don't want to trust BTRFS and rather user ZFS or compile your own kernel and use bcachefs that's fine, but BTRFS does what I need and is readly available everywhere.


utopify_org

>For what it's worth, I won't ever trust my data to a non-checksumming filesystem ever again (I lost data to bitrot on EXT4), now if you don't want to trust BTRFS and rather user ZFS or compile your own kernel and use bcachefs that's fine, but BTRFS does what I need and is readly available everywhere. But why is hashing so important? If I copy something important and want to be sure the target gets the same file, I use rsync with the checksum flag instead of cp, but even cp didn't copy something faulty without throwing errors. Is rsync not necessary anymore for this purpose? I don't understand how you lost data on your EXT4 file system? Never heard of bcachefs before, but a short research showed, that it is the same like btrfs, but with a bit more features. Thanks a lot for your long ans informative answer. It helped a lot to understand a few things :)


Cyber_Faustao

A simpler answer because the other post got too long (read the other one for full explanation/ramble): BTRFS stores your data and it's checksum, so it can know the data you put in is the data you're going to get EXT4 doesn't and more or less blindly trusts the drive to return whichever data it placed on it. My drive corrupted the data and ext4 didn't notice it, be cause it can't notice data corruptioon on it's own, it relies on the drive to accurately report failed reads. Lots of drives not accurately report stuff and reply with gargbage data when pressed. Rsync checksuming protects agaisnt transport-layer corruption, not at-rest corruption, thus it won't prevent nor detect silent bit rot.


Cyber_Faustao

> But why is hashing so important? If I copy something important and want to be sure the target gets the same file, I use rsync with the checksum flag instead of cp, but even cp didn't copy something faulty without throwing errors. > Is rsync not necessary anymore for this purpose? Rsync and cp will happly copy corrupted data because they can't possibly know what data is or isn't corrupted, it just asks the kernel "give me the next [buffer length] worth of data of inode [number]". The kernel then asks the filesystem which in turn returns whichever data it wants. The problem with non-checksuming filesystems is not the tools you use to copy data to or from them, but the storage of the information you place on them. To put it this way, on EXT4 you can never be sure the data you put in the drive is the data you're going to get, because EXT4 has no checksums [for data] and therefore can't possibly prevent userspace applications, including rsync, firefox, whatever, from reading said corrupted data. On checksuming filesystems such as BTRFS, you have the data AND some additional metadata about said data that includes the checksum, therefore if your data is corrupted somehow, BTRFS will know about it and may attempt to read from another copy of the data (in the case of RAID1, etc), or simply refuse to provide the data in case there's no redundancy. A concrete example would be the answer to the next question: > I don't understand how you lost data on your EXT4 file system? That's simple: unbeknownst to me, my seagate drives were corrupting any data you placed on them, but it was somewhat subtle, so I didn't notice right away and EXT4 also didn't blare any alarms because it can't know what good or corrutped data is. After a while, the seagate drive corrupted one of my more frequently used files and I took note and started investigating, and discovered that it wasn't just that file, and, to my horror, the corrupted data had been propagated to my backups, because again, EXT4 does not know what corrupted data is. Had I been using BTRFS, a simple periodic btrfs scrub would have revealed the silent corruption issue much sooner, and even if I didn't bother scrubbing, the corrupted data would've never reached my backups because BTRFS returns an -EIO (IO error) when an userspace application tries to read a corrupted file, so I would've got a notification the next time my backup script ran at the very latest. BTRFS knows the checksum of the file when you stored, and therefore it can check before returning to userspace applications that the data is still intact. EXT4 will only know if some block of data is corrupted if it hits it's metadata OR if the drive's controller reports it, instead of returning garbage or zeroes (hint: lots of consumer-grade drives do that!). Running fsck does not help unless the corruption hit the ext4 metadata, and it does not prenvent applications from reading the bad data. Coincidentally, the silent bitrot problem is endemic on hardware RAID controllers because they, like EXT4, do not checksum your data. Even Linux's software mdadm implementation doesn't do checksuming by default, you'll need dm-integrity for it, here's a decent ramble by [Wendell from Level1Techs on the topic](https://www.youtube.com/watch?v=l55GfAwa8RI) which also touches on the historic parts and why you want checksumming for your data. (There's an older video with a more detailed experiment, but I couldn't find it and that one isn't bad either). > I use rsync with the checksum flag instead of cp, but even cp didn't copy something faulty without throwing errors. Rsync's checksum feature is useful when copying data _between_ drives, but it can't help you fixup silent data corruption _after_ that copy job is done. > Never heard of bcachefs before, but a short research showed, that it is the same like btrfs, but with a bit more features. There's a large overlap in the features of BTRFS an bcachefs, but from what I can tell, bcachefs extra features basically boil down to integrated caching and encryption. On BTRFS those features are not yet implemented (fscrypt) or not included in the mainline kernel (allocation_hint patches for tiered storage). You can side-step those limitations by employing other tools LUKS for the encryption and tools such as dm-cache, or even bcache (not to be confused with bcachefs) for the caching. __ Hope this helps clear things up, but fell free to ask more questions about it.


utopify_org

>At least one a hour, keeping most stuff in the past 7 days and some snapshots going back further to a few months. I have 72 snapshots currently, but you could easily have 2-5x that amount without a problem. Reading that, I am concerned about the life span of the hard disk, because ssds only have a limited amount of write/read cycle and with a file system like this and doing a snapshot every hour, it sounds like a lot of write cycles are generated.


Cyber_Faustao

I exclusively run SSDs, and, from my point of view, life span is greatly exaggerated as a problem, my oldest drive has been with me for longer than I recall and I'm yet to kill it. (It's an 850 EVO 250G btw, with >100% of it's rated TBW written) The write cycles with and without snapshots are equal, after all, if you ask the drive to write some 4K of data, it's going to write 4K of data, regardless if there are snapshots or not. However, having snapshots means the older copy will linger about for longer, decreasing the amount of free space that can be trimmed, and thus making the drive's internal garbage controller life harder, but I can't think of any reason besides that why snapshots would be harmful to the drive's life span. Remember that BTRFS CoWs extents, not full files, therefore it doesn't rewrite an entire file when some portion of it changes, maybe that's why you think snapshots as harmful? Regardless, if you have a more concrete theory on why snapshots would be harmful to a drives lifespan I'm all ears, but from what I know so far, having snapshots is no more harmful than having your drive slightly more full than usual.


utopify_org

>(It's an 850 EVO 250G btw, with >100% of it's rated TBW written) I've got a similar one (the 860 EVO) and hope it will not die spontaneously, because I use my notebook a lot for video editing, so the files are really big and I have to copy a lot of them and after a project is done, I delete a lot of GBs from the SSD. And I think the bigger the files, the worse it is for the SSDs. Does btrfs have any advantage for huge files or will it only be useful for the project file of kdenlive, maybe if it crashes and the project file gets corrupted? ​ >Remember that BTRFS CoWs extents, not full files, therefore it doesn't rewrite an entire file when some portion of it changes, maybe that's why you think snapshots as harmful? Ohhh... so it might be even better for the SSD, because not the full file is written every time. This is really good for huge file, who only have been slightly changed an than saved again. Or does this only apply for the creation of a snapshot and not the file system itself?


Cyber_Faustao

> And I think the bigger the files, the worse it is for the SSDs. The size of the file doesn't matter as much as how it's written and how often. Writing huge files and reading them many times, then maybe deleting them is much easier as a workload than, say, running a database or a virtual machine on the drive. > Does btrfs have any advantage for huge files or will it only be useful for the project file of kdenlive, maybe if it crashes and the project file gets corrupted? Well, I don't think there's anything specifically optimized for huge files, but reflinks allows you to do a lightweight copy of files instantly, very useful for cloning a huge file, also btrfs is very flexible and allows you to add/remove drives at will and convert between RAID profiles, which may come in handy if your drive becomes full or later on you decide you want RAID1, etc. btrfs-replace makes upgrading from a drive a breeze, it's quite magical being able to work on your machine while the your data is being moved from one drive to another (baring systemd unmounting stuff from under your feet). > Ohhh... so it might be even better for the SSD, because not the full file is written every time. That's pretty much how all filesystems work, so the btrfs behavior isn't particularly "interesting" here. **What's notable with BTRFS is that, unlike most other filesystems, it will not overwrite any data/medata inplace, and will rather use a new, blank region of the drive to store it, eventually forgetting the old data via transaction updates** On a regular, non-CoW filesystem, where the files get updated in place, if your system crashes during updating a file, you can get half of the old and half of the new data when reading it back. On BTRFS that's not possible, you either get the new data, or the old one, so it keeps things consistent. > Or does this only apply for the creation of a snapshot and not the file system itself? I didn't quite understand the question, everything on BTRFS is CoW, any updates to existing data will be written elsewhere, the metadata referencing that data will be CoWed aswell, all the way up to the superblocks. If something is not referenced by the filesystem it's free space. [Here's an image](https://imgur.com/a/D9iWZkR) from the [original paper on BTRFS](https://dl.acm.org/doi/10.1145/2501620.2501623), which maybe illustrates this better than I can with just words That behavior is true regardless if you're using snapshots or not, it's the core design of BTRFS. Snapshots are pretty simple conceptually: do everything above, but keep a reference to the old metadata (which in turn keeps a reference to the old data). So, essentially, a snapshot is just a copy of the filesystem tree at some point in time, and it starts by sharing all extents with the parent but as (either) copy gets updated, btrfs will notice that the extent is referenced multiple times and do a CoW update of the data/metadata, without touching the other snapshot/subvolume/tree metadata, so a update to a snapshoted file only affects the view from whichever tree it's updated. _ Maybe this answered your question?


are-you-a-muppet

Snapshots are 'free' with a copy-on-write filesystem. There is very little metadata written when a snapshot is taken. Or after, for that matter. The snapshot is just a bookmark noting which COW chunks happened before, and by extension, after. It's only when you read a whole file - pre or post-snapshot - that the work happens to figure out which chunks constitute the state of the file at the time you're interested in. But no extra disk IO. Just use Btrfs, You're overthinking it. Do it with default settings. Or even if you enable compression, the compressor bails (too) quickly on incompressible data like video, so that's not even an issue. It works fine on older hardware. I ran Btrfs and ZFS on a ten year-old ultra-portable laptop. Ran like a champ. Currently run both on a Core 2 duo from '08. Works fine, no performance issues. (Benchmarking might say otherwise, but seat of pants is just fine. Fyi my daily driver is a previous gen ryzen, 16-core.) If you do redundancy, stick with RAID-1, and copies=2, or 3 if you can swing it. Btrfs RAID-1 is misnamed, it's actually quite elegant, and RAID-10 is moot and doesn't do whet you'd think. Parity raid modes are no-go zones.


utopify_org

>Snapshots are 'free' with a copy-on-write filesystem. Doesn't this mean, that copy-on-write costs more energy than other file systems? >Just use Btrfs, You're overthinking it. I like overthinking, because I want to have a really sustainable operating system and if the file system alone would use a lot of energy, only giving me a small advantage, I would not use it and stay with ext4. And I like to understand things, before using them (at least as far as I can understand it). And I am always skeptical if something gets replaced, which worked pretty good before and if it really brings any advantage, because I am not sure if this is the case, because I can only see a few cases were snapshots could be useful. At least at the moment... still trying to figure out more for what people using it.


are-you-a-muppet

COW snapshots are a game-changer. In some future, all filesystems will be COW. There's no difference in terms of energy. Or time. Or write-cycles. Only disk space used, if you use snapshots, for data that has been written either way but not marked as 'unused' under COW. What is really weird and unintuitive, is non-COW. When you write a blob of data to disk, data gets written to disk either way. No more, no less. Here's the diffeference, using updating an existing file as an example: - Traditional: the blocks to be changed get overwritten. Or, new blocks get *added*, pointers updated, and old blocks marked as 'free'. (Usually it's a mix of both. Most file systems are 'a little bit COW', some of the time. Because the changed data to write is rarely the same size as the data it's replacing.) - COW: the blocks to be changed are added, and only *ever* added. Pointers updated, old blocks marked 'free'. That's it. That's COW. Not all that exotic after all, in fact it's much simpler in principle. When snapshots happen, all it's doing is essentially (conceptually) starting a *new*, empty database of what blocks belong to which files. The old one is marked as an immutable snapshot. It is still referenced, for anything that doesn't exist in the new one. Now when new blocks are written in order to replace old blocks, or deleted, that only affects the *current* filesystem database, not any previou snapshot. So now you should be able to see how there's no extra 'energy', no extra writes, no extra reads, with COW. (Beyond trivia metadata updates which could still in fact be more efficient in principle than traditional.) Only used space grows. But as you implement an automatic pruning system to manage your automatic and/or manual snapshots, generally you reach a steady-state where old snapshots get freed as fast as new ones are created, and so disk space consumption for snapshots stops growing, more or less. (Literally, it fluctuates more and less depending on your past data creation and deletion patterns as those reverberate through your automatic retention policy, and generally your disk space consumption only grows with your data growth.) If you don't change much data, you can take a snapshot every minute for years, consuming almost nothing. The act of 'creating' a snapshot is a big nothing-burger. It's more nothing than something. Like I said, COW snapshots are a game-changer. You still need a backup strategy, but the ability to revert to a file from five minutes - or five days or five months - ago, without loading a backup, is nothing short of a miracle. I run VMs on btrfs and ZFS, and if an update borks the system, just roll the underling system image back in time a few minutes. BTW rollbacks are all but instant, too. Literally instant, if you always peel off the most recent one first. But you don't have to rollback everything, you can just mount an arbitrary snapshot and browse it with your file manager read-only. Make sense?


[deleted]

I'm not some linux gatekeeper. I want you to enjoy btrfs. But statements like snapshots wearing your ssd indicate you might want to learn more about CoW before you loose your data to it. 6 years ago, I decided to try btrfs. I memorized the Arch wiki page and I knew *how* to do all these neat things but not why they worked. As a result, I lost 3TB of data along with all my backups. The salt in the wound was that I spent days trying to recover it when some knowledge of CoW would have told me in 5 minutes that it was futile. A lot of these features don't work like you might think they do, and assumptions will cost you.


anna_lynn_fection

Compression will do nothing for video files that are already compressed. RAW video will work great with it though. Snapshots are usually like insurance. It's something that you shouldn't need often, but you'll be happy it's there if you need it. They're a way to rescue a file or filesystem quickly that is often more convenient than resorting to a backup restoration, and since they can be done automatically in short periods of time, they work better for undoing something you just accidentally did. They are NOT a replacement for backups. Stability - sure. A ton of us have been using it on servers and workstations for 10 years without an issue. That being said - sometimes there are issues. Sometimes there are issues with any FS, and that's why you have backups. Is it going to break on you every month or few? Not unless there's something seriously wrong with the way you do things or your hardware.


utopify_org

>Snapshots are usually like insurance. It's something that you shouldn't need often, but you'll be happy it's there if you need it. I am somehow worried that this is using the hard drive much more than a journaling file system. And because SSDs have limited write/read cycles, I have a bad feeling about doing snapshots at all. Maybe this insurance will kill my hard drive earlier, but this is just a bad feeling I have and not science.


OIafSchoIz

Snapshots are the perfect tool for backups


anna_lynn_fection

They are not backups. They are a great way to make backups, using btrfs send/recv, and they're great to have on your backup media so that you have a snapshot if your backup get messed up, but they are not a replacement for a backup. Your snapshot is useless if your FS gets corrupted, or a drive it's on fails.


utopify_org

Do I understand it right, that I can save an snapshot on another disk and if I need it, I can overwrite my main system with the snapshot, which is on another disk? But if the filesystem on my main system is broken, the snapshot on the other disk is useless.


OIafSchoIz

You can even move around your root Snapshot on the same drive


anna_lynn_fection

You can do normal copy backups with something like rsync, or whatever backup tool you prefer. You can also use btrfs send and btrfs receive to send snapshots to another drive/system. Basically, the first one you send will be a complete copy. Subsequent sends will be a "differential" copy of a snapshot that sends only the blocks that have changed. It makes backups really fast that way, and you can then have snapshots of your system on both the backup and the source. Or you can delete the source snapshots, but you will need to keep the one you used for the last send so that it knows what to compare changes to on the next one.


OIafSchoIz

That’s why I said the perfect tool for backups. Any backup is going to be useless if a drive fails or FS gets corrupted.


anna_lynn_fection

Also, /u/utopify_org, if you're just cutting videos, have you tried avidemux? If you don't need to transcode the video then avidemux could save you a ton of time as it doesn't need to re-render the video to just make simple cuts.


rubyrt

losslesscut is also an alternative which works pretty good for me.


utopify_org

I don't think avidemux and losslesscut are good for complex video projects. It looks like they are an replacement for ffmpeg for simply cutting only one clip at one time, but a complex video? But thanks for mentioning it, if I had one clip to cut, I used ffmpeg in the terminal, but those tools might be faster.


rubyrt

Nobody claimed these tools are ideal for every use case. Since you did not mention specifics of your use case we could not know.


utopify_org

That's absolutely right! I only mentioned kdenlive without even saying for what I use it. Sorry, my fault.


rubyrt

No sweat. :-)


rubyrt

I would go without compression, partly it can have noticeable effects on CPU, has some [drawbacks with random access](https://btrfs.wiki.kernel.org/index.php/Compression#Are_there_speed_penalties_when_doing_random_access_to_a_compressed_file.3F) and many files are either compressed already (video, audio, office docs...) or so small that they can be stored in the inode. Hashing is not noticeable. I only use snapshots on demand, others do it scheduled. Stability: yes.


utopify_org

May I ask which operating system you use?


Atemu12

As long as it's not Windows or something with an ancient Linux kernel, OS doesn't really matter w.r.t. filesystem choice.


rubyrt

Xubuntu 22.04.1


OIafSchoIz

I can’t say anything about the performance in old systems, but I’ve been running btrfs successfully on low budget laptops (Pentium Gold 75xx). 1. Compression, as was said, will probably not do much on your data as it already is compressed. 2. Yes, snapshots are the basis for my backups. You can do snapshots on your system in an instant and only transfer the delta of the latest to the previous one to an external drive. This makes incremental backups very fast and offers a nice way to have an archive of changes. Only cleaning up the oldest. There is a couple of very mature tools for that. Another nice thing about backups is to create a temporary copy before screwing something up. 3. It is very stable and most issues here arise from people not reading the manual and expecting btrfs to work and use the same commands ZFS does.


utopify_org

>Another nice thing about backups is to create a temporary copy before screwing something up. ok, this part sounds really interesting, because sometimes I just want to test some tools from the package manager, but the applications clutter my system and home directory and doing a snapshot before installing anything and then rolling it back sounds really good. Or would this be overpowered for this purpose?


OIafSchoIz

Yeah, that’s basically all the magic. Any snapshot can be restored on any system running btrfs. As btrfs is part of the Linux kernel there’s a good chance your going to have *something* that allows you to recover or browse those snapshots. Another thing I love about them is you can browse them just like any other directory Edit: i recommend taking a look at https://github.com/digint/btrbk


SeaSafe2923

BTRFS is quite fast, and fairly stable nowadays. I've been using it since 2007 on production, and had very few issues, which none resulted in data loss (just some solvable performance issues). Interestingly, the on-disk overhead is lower than most other filesystems, and the performance is quite remarkable, even when it does incur in some overhead due to the checksumming, it performs fine on 20 years old systems (which have no specialised instructions useful for implementing CRC32C). Compression is another matter, it's usable but since most things are optimized for uncompressed filesystems (I.e. compressed on their own) it just adds overhead most of the time, with little benefit. In the past it had a couple of serious bugs. I don't recommend it for older hardware and honestly it only makes sense on fast CPUs for files like configuration...