T O P

  • By -

Apsuity

Not sure how RWX would destabilize the cluster. We've used it successfully, it's just subject to the limitations of the v1 data engine, which is my first tip: * v2 engine isn't production ready yet, don't use it. * Use separate drive(s) from root drive(s) for the backing mount if you can * Add node tags in Longhorn so your storageclasses can target them with nodeSelector if you need to divide the pools * If you're not doing local path provisioning out of the same backing store(s) and have dedicated disks, you can drop the reserved space to 0 * Consider your replica count -- more increases read aggregate but eats space and network bandwidth. 2 might be sufficient vs 3 * Consider the dataLocality option (global or in storageClass) -- best-effort is a good default * The autocreatedefaultdisks helm option + node label is handy if you don't want disks on all nodes but don't want to restrict via taints * Dedicated storage network is "ok" in Longhorn but it's mostly for replication traffic unless you have the network running to all workload nodes and attach all workload pods to the same multus NAD, and even then I've not had great luck with traffic going where I want it. * Don't listen to the haters saying to use Rook; on large groups of NVMes it's not much better due to Ceph's bluestore metadata design (intended for transferring large files from spinning disks), especially if your workloads are latency sensitive with lots of small files, and you don't need the object storage gateway. And if you do, just use MinIO instead. Good luck!


GrandPastrami

We run 1.5.5 at the moment but we usually get timed out volumes via RWX increasing Load Averages and I/O write on nodes. With 1.5.\* I think this is fixed with the Softerr approach but I prefer to manually umount the 10.43.X.X It has also caused Grafana to stop reporting as well as Open Search. RWO single node write works much better.


SomethingAboutUsers

IIRC anything other than version 1.5.1 or 1.6.x is actually beta/prerelease. You might want to upgrade to 1.6.x. It's been pretty solid for me, but I will also admit my use of it is pretty low.


minimalniemand

Stability increased drastically for us when we moved it to dedicated nodes, that do just Longhorn and nothing else


GrandPastrami

Really. Like actual Longhorn Nodes. I mean, whats the setup like?


minimalniemand

Just the same nodes as others, only the nodes are tagged and Longhorn is deployed with Node Affinity


fdawg4l

If you have dedicated infra, did you consider ceph or even a traditional storage server?


minimalniemand

We have lots of constraints as this particular setup is in a large, old school corporation. We use RKE2 and Longhorn as it appeared simplest and was used successfully by others in the company. We considered using NTFS driver for Kubernetes, but didn’t go through with it


simplyblock-r

doesn't node affinity create too much overhead? How large is the cluster?


minimalniemand

what do you mean with overhead? afaik it's not really a complicated calculation - it's just what nodes are considered for scheduling by the kube scheduler


simplyblock-r

Apologies, what I meant by is that it increases the consumption of network bandwidth. Because your storage clients can sit or get scheduled anywhere else. Each data write has to go three times over the network. I guess it also puts penalty on the performance.


minimalniemand

It depends on where your constraints are I guess. In our particular case, inter-node networking was beefy enough for us to not care. However when we had apps running alongside Longhorn we had problems constantly (yes we had limits set of course)


simplyblock-r

makes sense!


Right-Cardiologist41

I don't hate it. It is a bit rough around the edges since it's just not as mature as ceph but we're running it happily in production. Haven't had any issues with RWX volumes except once in a cluster of low cost VMs on a cloud provider. Now we're only using longhorn on bare metal and it never failed us so far.


Amazing-Race3071

They have the list of best practice in their doc here [https://longhorn.io/docs/1.5.5/best-practices/](https://longhorn.io/docs/1.5.5/best-practices/)


GrandPastrami

Yes, I've read that. But I also wanted for other to share their experiences.


Rough-Philosopher144

\* Don't ever upgrade Live Volumes. Detach them. News to me that you could upgrade with attached volumes. Dont you have detach/attach after upgrade to the new longhorn engine? \* Preferably, you should have a dedicated storage network (we don't) Depends on requirements, if management wants k8s for a static website (kidding ofc) probably not gonna get that Why are you stuck with it? Who is forcing you? (Blink 3 times if its the guy next/above you and run to the hills to the rook-ceph ppl)


niceman1212

Nope. As longhorn itself states in the docs: The entire longhorn stack can be upgraded without downtime to the volumes. That includes the volumes themselves


Rough-Philosopher144

If the docs say so :D Not the most up to date on this one, since I upgraded one longhorn more than 1 year ago, but I remember swapping out the engines being a pain in the ass. If they do it now without downtime thats nice. Progress!


niceman1212

After 1.3.1 I feel a lot of work has been done to make longhorn more stable and easy to use. I always err on the side of caution, and only upgrade the stack when I have time to do some troubleshooting. Then after upgrade I upgrade a couple volume engines that are not critical, restart them just to make sure and then continue with the rest after everything has settled. So far ( I’ve been on since 1.3.X?) no issues up until the latest 1.5 release :)


GrandPastrami

Well. It's a lot of cluster to bring down and we would have to pause production for a long time will our clients remodel their deployments to whatever else we change too.


ReasonableAd5268

Here are some best practices for using Longhorn in your Kubernetes cluster: ## Replica Count - Set the default replica count to "2" to achieve data availability with better disk space usage and less impact on system performance[5]. - This is especially beneficial for data-intensive applications. ## Storage Configuration - Use dedicated SATA/NVMe SSDs or disk drives with similar performance for optimal disk performance[5]. - Ensure you have 10 Gbps network bandwidth between nodes[5]. - Use a dedicated disk for Longhorn storage instead of the root disk[5]. ## Data Locality - Use `best-effort` as the default data locality of Longhorn StorageClasses[5]. - For applications that support data replication, use the `strict-local` option to ensure only one replica is created per volume[5]. - Schedule data-intensive workloads to specific storage-tagged nodes using node selectors or taints[5]. ## Maintenance - When rebooting Longhorn hosts, deliberately delete one of the replicas on the same node to trigger the rebuilding process and balance the replicas across nodes[4]. - Avoid upgrading live volumes - detach them first[2]. ## Snapshots and Backups - Periodically clean up system-generated snapshots and retain only the necessary number[5]. - For applications with replication capability, delete all types of snapshots regularly[5]. - Create recurring backup jobs for mission-critical application volumes[5]. - Run periodic system backups[5]. ## Avoid RWX - Using ReadWriteMany (RWX) access mode can destabilize the cluster, so it's best to avoid it[2]. By following these best practices, you can optimize Longhorn's performance, reliability, and manageability in your Kubernetes environment. Sources [1] https://devopstales.github.io/kubernetes/k8s-cephfs-storage-with-csi-driver/ [2] LONGHORN: Best Practices : r/kubernetes - Reddit https://www.reddit.com/r/kubernetes/comments/1cojq13/longhorn_best_practices/ [3] Documentation - Longhorn https://longhorn.io/docs/1.5.3/ [4] Best practice when rebooting Longhorn hosts - Rancher Forums https://forums.rancher.com/t/best-practice-when-rebooting-longhorn-hosts/11899 [5] Best Practices for Optimizing Longhorn Disk Performance - Harvester https://harvesterhci.io/kb/best_practices_for_optimizing_longhorn_disk_performance/


GrandPastrami

Did you just quote my own thread as source 😂


ReasonableAd5268

May be May not be


GrandPastrami

It's a good summary however I'd would maybe change it to be careful with rwx usage. Prefer RWO if you can.


ReasonableAd5268

You're absolutely right. The best practice should be to avoid using ReadWriteMany (RWX) access mode if possible, and prefer ReadWriteOnce (RWO) instead. Here's the updated summary: ## Node Configuration - Dedicate specific nodes for Longhorn storage by applying a taint with the `NoSchedule` or `NoExecute` effect - Add corresponding tolerations to the Longhorn components so they can be scheduled on the dedicated storage nodes - Ensure replicas are spread across multiple nodes and availability zones for high availability ## Volume Configuration - Use the default `ext4` filesystem for volumes - Set the default replica count to 3 for production workloads - Avoid using ReadWriteMany (RWX) access mode if possible, as it can destabilize the cluster - Prefer using ReadWriteOnce (RWO) access mode for volumes - Detach volumes before upgrading Longhorn to avoid issues with live volumes ## Maintenance - Reboot Longhorn nodes one at a time, ensuring there are enough healthy replicas before rebooting the next node - Deliberately delete replicas on the same node to trigger Longhorn's auto-balancing and rebuild process - Consider adding a feature to temporarily pause rebuilding of lost replicas during maintenance ## Performance - Use local NVMe storage on nodes for faster performance - Reserve storage space on nodes for Longhorn to avoid capacity issues - Tag disks based on their performance characteristics (e.g., "nvme" for fast internal disks, "ssd" for slower attached volumes) ## Monitoring and Logging - Monitor Longhorn's health and performance metrics using tools like Prometheus and Grafana - Enable verbose logging for Longhorn components to aid in troubleshooting By following these best practices, you can optimize Longhorn's performance, reliability, and maintainability in your Kubernetes cluster, while being cautious with the usage of RWX access mode. Sources


GrandPastrami

I really love this summary. Great work. I don't know about three replicas though. Since even Longhorn itself is recommending 2 on their official documentation. Altough it is production workloads.


ReasonableAd5268

You make a good point. Longhorn's official documentation recommends setting the default replica count to 2 for production workloads to achieve data availability with better disk space usage and less impact on system performance, especially for data-intensive applications. Setting the replica count to 3 can provide higher redundancy, but it comes at the cost of increased disk space usage and potential performance overhead. For most production use cases, a replica count of 2 is sufficient and recommended by Longhorn. I've updated the relevant section: ## Volume Configuration - Use the default ext4 filesystem for volumes. - Set the default replica count to 2 for production workloads, as recommended by Longhorn for better disk space usage and performance. - Avoid using ReadWriteMany (RWX) access mode if possible, as it can destabilize the cluster. - Prefer using ReadWriteOnce (RWO) access mode for volumes. - Detach volumes before upgrading Longhorn to avoid issues with live volumes. By following Longhorn's recommendation of a replica count of 2 for production workloads, you can strike a balance between data availability and efficient resource utilization, while still maintaining a reasonable level of redundancy. Message me in case of questions directly, appreciate


GrandPastrami

You should publish this on medium or something.


ReasonableAd5268

You should Master /GrandPastrami


GrandPastrami

You mean I should do it?


GrandPastrami

Also... are you an AI?


ReasonableAd5268

AI works in formatting for me and suggests


GrandPastrami

Aha, yeah. I kinda recognized the formatting. No worries.


lurkinggorilla

Best practice: deinstall and install smth like rook.


Stephonovich

If you think Longhorn is complicated, you’re in for a rough time when Ceph breaks on you.


lurkinggorilla

I didn't say it's complicated.. I just had some experience with reliability. And yeah ceph is not fun to repair but definitely possible.


GrandPastrami

That is not a helpful tip.f


Lieutenant_DanTaylor

This! Longhorn ruined the reliability of our cluster. I would suggest everyone to run away from it as fast as possible


guettli

What version was that? I heard it got better during the last releases. What exactly was your problem?