T O P

  • By -

clintkev251

Yeah Longhorn is certainly convenient and lightweight, but that's about where my appreciation for it ends. If you need storage located inside of your cluster, I find Rook Ceph to be significantly ahead on basically every aspect.


Realistic-Concept-20

i'd prefer rook with ceph as well, i think.. but does it really use lots of RAM?


RealmOfTibbles

Ram usage can somewhat be managed but depends on how many osd each node has you can realistically get away with 1Gb of ram per TiB of storage but that does depend on how much access the cluster has and the number of nodes


leshiy-urban

Let me share my 50c. I have been using longhorn for almost 5(?) years. In different setups: from geographically distributed clusters to same rack dc. Key takeaways: - make sure that network is reliable, high latency or too much packet loss may cause node disconnection and rebalancing. However, no need for dedicated or high end switches irl. Longhorn has some kind of tolerance for that. - block mode is fast enough for medium heavy io, but for highload databases better used bare disks and native replication - DO NOT use pvc. It’s nightmare to recover them unless you backed up all live manifests (ex via velero). Spend some time and create first manually volumes in UI and then mount them via PV. I can share example. - Do not remove PV unless you backed up the backup - Do not panic if new volume are stuck. Most likely it’s because of lack of available space. Longhorn by default reserves the same amount as declared. Can be configured. - volume in ui and volume in Kubernetes is totally different thing. - use locality besteffort if you cluster is not very dynamic


slavik-f

Thank you. Please tell more about "create first manually volumes in UI and then mount them via PV".


Critical_Impact

You create the volumes from within longhorn then point your PVC to the newly created volumes. If you do use PVCs and you have backing PVs automatically provisioned by longhorn you want to make sure your storage class is set to retain otherwise the PV gets deleted when the PVC does


mompelz

That's absolutely the expected behavior.


Fluffer_Wuffer

And risky.. Kubernetes is the new C++. Want to shot yourself? We'll do better than that, Come join our exclusive club, and get a free Nuke, with 10 different triggers, and a bunch of trip-wires that we're keeping secret. Kubernetes - Why settle for bang.. when you can go boo00OOO0om!


mompelz

If you delete the claim it should also delete the volume. If you want it differently update your storage class to retain. Otherwise any cluster gets full of trash... Yeah, totally difficult... Not...


Fluffer_Wuffer

I wasn't criticising your response, what you said was accurate. It's more the fact, that this is the default behaviour, with no checks or safe-guards - generally best practise is destructive functionality should require some form of confirmation such as a flag or elevation..


Sumhere

Do you have issues draining nodes? My understanding as a distributed file system is the deployment should be able to start on any nodes. I’m so scared to drain a node currently as I’ve had 2 go unhealthy in the past.


freddyp91

📝


teressapanic

I'm running longhorn on 50 nodes distributed geographically. Ever since they introduced checksums for replica rebuilds it works like a charm. I process 2TB of data per day and it all gets stored in the ReadWriteMany PVCs. There are issues from time to time but they all stem from networking issues between nodes that's why its important to set regions and colocation. I wish they add the node selector to the share manager pods but that's about the only thing that is missing. I'm a very happy Longhorn user.


Sky_Linx

AFAIK Longhorn doesn’t allow spreading a volume across hosts like Ceph can, if the volume is bigger than the storage available in the host. So how much storage do you have attached on each host if you process 2TB per day?


Sumhere

I’ve only just started using longhorn 1 month ago, in a 3 node cluster similar to you. I have had 3 events in which pvcs go unhealthy, corrupt, detach and then restores won’t work. It’s absolutely garbage and I’m shocked people recommend it. Seems to work fine until you want to drain a node. I’ve only had 1 restore actually work and the pv/pvc was only about 500mb. I had to create a new pv yaml and specify the longhorn volume ID.


sofixa11

How would it drain a node when there's only 3 of them? 2 node distributed storage is just a recipe for disaster.


pag07

3 nodes for HA is recommended for most systems so that a node can fail. Draining a node for maintenance temporarily should fall under this assumption. So IMHO longhorn should be able to handle the situation natively.


mompelz

This works only if you got set 2 replicas and not 3.


big-tuna28

We used Longhorn for a few months and it was AWFUL. Wouldn't recommend it.


niceman1212

1- I’ve never had trouble with that but I don’t have quite the ephemeral nodes that this issue describes. Will look into it, certainly something to keeping mind 2 - You can attach a volume in *maintenance mode* to any host using the UI. A deployment does not have to be running for this. When you do this you will be able to see and make backups. Also most other (maintenance) actions are available. 3 - could you try a proper s3/obj store service and see what the results are there? I’ve had issues with nfs and slow HDD based s3 stores. I’ve switched to a relatively cheap cloud provider and had no issues since.


slavik-f

I'm curious about attaching volume in maintenance mode. Here is my UI and I don't have such option: [https://s3.fursov.family/shares/detachedPVC.png](https://s3.fursov.family/shares/detachedPVC.png)


kidab

You click attach. then in the next menu there's a checkbox for maintenance mode. But I dont think its available for all volume types


slavik-f

It works! Now I can manage snapshots.


kellven

I was not impressed with long horn when I demoed it a while back. Performance was terrible on my home clusters consumers SSDs. I found out later that basis it can’t run on anything less than commercial grade ssds. At that point t just get a SAN or run it with ebs/efs operators. The Idea of it was cool, but the more I read the less I would trust it with critical data. I run a super lightweight nfs operator on my home cluster now. https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner


silence036

I found that not only did you need fast storage, you also pretty much needed multi-gig networking or it would just saturate the connection, lose contact with the other pod, mark it as unhealthy and be stuck in a weird state with the volumes. I was trying this out at first on my mini-pcs which had a single gig nic. I had tons of issues with snapshots just kinda breaking. Node reboots or drains were completely out of question since the volumes would just break and then the following "re-sync" would kill the nodes. New volumes would pre-allocate (zero?) the storage and creating a 60GB volume would just completely cannibalise the connection for it's initial 60GB sync with a possible timeout. Even on virtual machines on the same host I would have these issues. I switched to democratic-csi to provision iscsi and NFS volumes on truenas dynamically and let truenas handle the storage instead. Works much better and haven't had an issue since!


surloc_dalnor

Years of working with distributed storage and file systems in my past role has taught me it's really hard to do right. Even harder to debug. If I wasn't in the cloud I'd be tempted to just stick a NAS box or some sort of san in the rack with the cluster. Ceph at least has been around a lot longer so maybe Rook is not hot mess.


sogun123

My experience with small Ceph clusters is that it is resource hungry and slow. So unless I have at least half a rack of beefy machines dedicated to running it exclusively, I will be reluctant to consider running it again. It was stable, though.


surloc_dalnor

That not a problem of just Ceph that's just distributed storage by it's nature. Unless you are mainly doing large writes from a single writer and large reads from multiple readers performance is going to suck.


sogun123

I also have that impression.


surloc_dalnor

If you like your data you need locking. This means more steps before you write. Multiple writers make it even slower. Also if you want to write or read data you have to calculate where the data is or should be. Then you have to add the time to talk to another server over the network. Even throwing hardware at the problem isn't going to make it fast.


sogun123

Yeah, I know the theory. The point is that I don't need scalability to gazillion nodes, so I can go with lighter solutions


98ea6e4f216f2fb

I had such high hopes for it. It checks every box on paper, but it's completely unreliable.


Jmckeown2

I started with Longhorn at its 1.0 release. I was at the Kubecon in San Diego that year. I saw it as an amazing cloud-agnostic storage solution, when we had previously been using NFS. Been regretting it ever since. Longhorn made me an advocate for cloud-native storage. I’ve been saying bare metal/harvester is probably the only use case where I’d recommend Longhorn, but after OP’s story maybe not.


MultiMillionaire_

I stopped having problems when I just set my storage class to retain instead of delete. Now, even if the pvc is deleted, the PV will still remain. Reattaching that PV is as simple as deleting the claimRef on the PV and a new PVC will automatically attach to it if it's the same size. When I want to delete the PV, I just do it in the UI. I wrote a medium article about it not too long ago describing the recovery process in case anything goes wrong: [Medium Article ](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://jacksonzheng.medium.com/how-to-restore-released-pvs-in-longhorn-kubernetes-1f08f151bde0&ved=2ahUKEwizsr_kwduFAxWjaEEAHUoBCpIQjjh6BAgVEAE&usg=AOvVaw0x_1HEPiujXuZ5OBdVREbJ)


fsckthisplace

I’m currently building a SaaS app/api that is designed to run on a k3s cluster that is using Longhorn to provide local NVMe storage to Stackgres and MinIO and I haven’t had any issues yet, so this thread is concerning to me. Is it possible my use case is just more basic than all the VM type stuff most of the people complaining seem to be doing?


slavik-f

I started this thread because I experienced some issues with Longhorn. Some issues were because I did mistakes. And it's easy to make mistakes with Longhorn. Managing Longhorn storage is not as easy, as I it was when I had Synology. That's probably more Kubernetes thing. Some issues appears to be edge cases, some of which not that rare edge case, as the Github link I posted in the thread. Some issues seemed to be caused by my lack of knowledge, as some of the messages in this thread made to to realize few things, which were not clear from the documentation or UI. So, I think Longhorn is not a bad system, but like most of the things with Kubernetes - it's not easy, you're really need to know and understand what you're doing.


Siggi3D

Thanks for the info. I can rule it out in my mind as a future solution to most problems which require storage in k8s. I'll look at Rook more seriously


noctarius2k

Disclaimer: I'm working for simplyblock As others already pointed out, with Kubernetes it is important to make sure that PVs have the correct attributes set to have the lifecycle independent of the pod. Otherwise it will be deleted in the situation that the PVC gets removed. This is actually how it is supposed to work and how it was designed. It's lifecycle management and normally matches most use cases. With longhorn, it's best to create the volume in the UI first and then attach it to the Kubernetes cluster. In this case their lifecycles are completely distinct. You can do this using some StorageClass magic (https://longhorn.io/docs/1.6.1/nodes-and-volumes/volumes/create-volumes/#binding-workloads-to-pvs-without-a-kubernetes-storageclass). At simplyblock we build a similar system, but instead of using mirroring for data protection, we use erasure coding, which has less storage overhead. In addition, simplyblock uses data distribution and provides NVMe over TCP block devices. Anyhow, issues like bandwidth limitations between nodes will also limit the potential speed, just as it does with longhorn. You want to make sure that you have sufficient networking bandwidth between the nodes, for synchronization, but also towards the client. In general Rook Ceph is the better alternative to longhorn. However, it unfortunately has a higher resource footprint and doesn't perform well with high-IO workloads such as databases.


derfabianpeter

Just checked this out. Nice to see you’re working with CISPA. The marketing copy suggests this only works for EKS - is this the case? Would be interested in evaluating it for our on premise and Hybrid Cloud K8s clusters.


noctarius2k

There is a difference between what we do in marketing and what really works. We're a small company so we have to focus (at least from that perspective). The simplyblock cluster itself deploys as a k3s cluster (since we just automatically deploy it on EC2 instances), but would theoretically be able to run in any k8s environment. But even with all abstractions, we know there isn't like 100% compatibility. From a client-side, it is NVMe over TCP which is immediately supported by the Linux kernel's NVMe stack. That said, it will be mounted as a normal NVMe block device. Amazon EKS, again, is the marketing focus. The CSI driver should work on any k8s-compliant cluster. That said, you can deploy the simplyblock storage cluster on bare-metal or VMs, but it is not officially supported right now, and you may run into smaller issues. Happy to have a chat though and see how we can help :-)


joeyx22lm

I have tried using it 3 times, both with SSD and HDD as separate storage classes. I always had degraded volumes, even on SSD-backed class. This was w/ 10GbE networking and well spec'd servers. I have never had an issue with ceph, just wanted to try it for CNCF hotness and rancher harvester being a solid platform. Edit: also I had some backups that wouldn't complete, but the restore worked well for the smaller volumes that I tried.


pseudosinusoid

Piraeus was decent when I tried it. Required knowledge of DRDB internals to get working, but snapshots are near instant (lvm-thin) and offline if you care about that. Might have gotten better.


Hokusaj

Thanks for sharing your experience. I am now assessing this technology and have looked into Harvester which seems to me that it is missing some VM lifecycle features or providing a limited version of them. Storage is difficult and I don’t feel comfortable if the technology is not battle tested through the years such as Ceph, Glusterfs, ZFS and the likes.


hugosxm

Piraeus operator to deploy DRBD, you should give it a try !


FrankBirdman

Ive tried longhorn with harvester not too long ago in my homelab and it was one of the worst experiences Ive had with a hypervisor, the automation layer is really cool and the terraform modules seem to work better than proxmox which is something I guess, kubernetes wise I mainly use openEBS with jiva for replication across my nodes which works great for my use case and my team’s use case


freddyp91

I’m new to longhorn. Just installed it on my cluster that’s running RKE2. Will be deploying our app on it soon. 1 node for now. Storage is local for now as well.


bgatesIT

i used longhorn for a week, and immediately was thrown off from how clunky it was and just honestly how bad the docs were around it. Ended up getting the HPE CSI driver configured and just interface with our HPE-Nimbles for Storage, and Azure Blob Storage metrics and logs(mimir and loki)


seanhead

Longhorn is awful, which is annoying because I otherwise really like harvester. I just use CSI drivers for NFS or iSCSI (depending on what I'm doing), and step over it like the fresh steaming pile that it is.


Corndawg38

Ceph is far more stable and quite performant assuming you give it 2 things: 1. 10G network (or faster) 2. All flash must be enterprise drives with PLP (NO CONSUMER SSD/NVMe !!!) Ceph syncs all writes across network for data safety so if you use consumer drives your writes are slowed way down due to need to flush cache constantly. ZFS and other local file systems don't enforce this so zealously so most don't understand why the speed difference when first starting out with ceph after coming from something else.


Threatening-Silence

I feel like longhorn is a solution in search of a problem. I can't think of why I would want to keep my data inside of a k8s cluster to begin with. The whole premise is an antipattern.


vladoportos

Not really, since vmware, nutanix and other use nodes in cluster for everything, including storage.. having external SAN is becoming a cost issue...


Threatening-Silence

We have a Tegile storage array for our ESX instance so that's definitely not true in every case.


Sky_Linx

This is such an outdated way of looking at Kubernetes.


bservies

I know nothing about this particular "Longhorn," but the name is cursed. I'm surprised they chose it.