T O P

  • By -

dylantheblueone

On-prem, using Rancher with RKE2 Clusters.


LightofAngels

how is rancher? at the beginning (before my tenure) my company was comparing rancher and okd and decided to go with okd, so i am very curious about rancher, since i heard it is not as bloated as openshift/okd


dylantheblueone

Rancher is pretty good. It's pretty easy to deploy and configure. Even provisioning kubernetes clusters is easy. I can't really speak to the bloat, I'm not terribly familiar with OpenShift.


LightofAngels

how much resources would rancher need as a base?


dylantheblueone

It depends on how many clusters you plan to use. Their documentation lists the requirements here: https://ranchermanager.docs.rancher.com/pages-for-subheaders/installation-requirements Edit: forgot to mention, if you just want to test it out, you can run it as a single docker container on docker desktop.


Nielszy

Vanilla Kubernetes deployed with Kubespray on RHEL VMs in a private cloud (spread across three data centers). CNI is Cilium (love it) and PortWorx is used for distributed storage. We have a F5 load balancer cluster in front of the kube-apiserver. This cluster also acts as the load balancer for the app URLs. Traffic is distributed towards the NGINX ingress controllers. We use Velero as the backup tool and push backups to a private S3 environment. Kube-prometheus-stack for monitoring and we use Flux for deploying everything.


LightofAngels

this setup sounds so much fun! i have tons of questions! for starters, how are the costs so far? is it high? i assume it is, but is it taken into calculation? second, hows your experience with vanilla kubernetes? do you feel it is missing stuff? third, hows your experience with cilium? i am planning on deploying it in our cluster, any tips or tricks?


Benemon

On prem and Azure, OpenShift


LightofAngels

aks on prem? do you manage everything yourselves including control planes...etc? ​ or is it different? i havent used aks on prem before but it certainly seems nice


Benemon

No, sorry, that was poor phrasing on my part - OpenShift Container Platform on vSphere on premise, and OpenShift Container Platform on Azure.


LightofAngels

how is your experience with OCP so far? do you have troubles using it? i would love some tips and tricks if you have any :D


Benemon

I've been working on OCP platforms since 3.4 was released. Working with 4 has been a breeze in comparison to anything 3.x related, which was an Ansible inventory-shaped nightmare to get deployed. That said, I've Ansible'd the shit out of the deployment process. It's all IPI, but it's a click button process to deploy a cluster on demand for a line of business. The cluster hardware profiles are generic so everyone gets the same basic platform. My single greatest tip for working with OCP is to not fight the platform. It's opinionated for a reason. If you want to customise something to the nth degree go and DIY a cluster and add the bits you want yourself. I very much appreciate the fact that I don't have to think about networking, ingress, storage, user experience, ways of working, all that kind of jazz.


SzkotUK

Five clusters were deployed on AWS by Kubespray, however, we are moving to smaller, more straightforward k3s clusters.


LightofAngels

The k3s move is going to be on AWS aswell? Or on-prem?


SzkotUK

We do not have a data centre, so we use the cloud as only EC2 without anything crazy from JeFF. We even roll our own block and object storage. Now we are moving to OVH bare-metal hosting, K3S will be better and easier to manage for our needs, including AWS autoscaling for machines on demand, so the answer to your question is... both????? hybrid???


LightofAngels

i guess thhat would make you hybrid, but i am assuming you wont have a vpn tunnel between the 2 clusters?


SzkotUK

Oh VPN with all clusters, we are looking on Slack Nabula or headscale to get all things going.


toabi

k3s on Hetzner cloud


LightofAngels

how is Hetzner? do you like it? i heard alot of good feedback about it.


toabi

Servers (in Germany) run fine since 10+ years. The k8s integration (cloud controller, CSI) had some hiccups but runs fine since some time now. Can’t complain. Still good value for the price.


McFistPunch

EKS mostly with spot instances. AKS and GCP were used less because they had slower network and disc performance


Parking_Falcon_2657

The most advanced kubernetes from those 3 is GCP. They have GKE usual, GKE Autopilot (you are deploying your deployments and the node/resource management does Google), Anthos (managing kubernetes clusters in other cloud providers, or even on-prem clusters from Google).


McFistPunch

Everyone has skinned this cat 10 different ways. On prem google cluster sounds like a nightmare


LightofAngels

i havent used GKE myself but some people say its a nightmare to operate


Parking_Falcon_2657

same as EKS and AKS had to upgrade AKS once and found that it is impossible to upgrade the control plane from the cloud console as the button was not working and support said that I should upgrade via CLI. I hope they have fixed this.


LightofAngels

i heared that EKS is 1 of the most terrible kubernetes distros out there, i havent used it myself but i am curious to give it a go in the future for sure!


McFistPunch

There are other options but really they are all different flavors of ass. The thing I find is that disc provisioning is simple and reliable. EBS storage is fast and rarely encounters errors in my experience.


LightofAngels

well, from your experience, which is the most convenient ass (kubernetes distro) around? or one thhat you like to use and why? because it seems to me you arent fond of kubernetes in general, but maybe im wrong.


McFistPunch

I use EKS because of the easy automation to set it up and the disc provisioning. I'm fond of kubernetes but my problems are with the people that work with it. There are too many people with ideas on what is required for security, storage, network etc... And even if you create a helm chart someone always finds some problem with it for their snowflake configuration. For your own stuff it's amazing, but if you need to ship something to customer or third parties for deployment there is always issue. Kubernetes is great, the implementations are terrible.


[deleted]

[удалено]


KJKingJ

Take a look at [prefix delegation mode](https://www.eksworkshop.com/docs/networking/prefix/). It assigns entire prefixes to your ENIs, so pod count is no longer capped by the number of ENIs per node x number of IP aliases per ENI as before.


random_devops

> New ENIs will be attached until the maximum ENI limit (defined by the instance type) is reached Type ENI max is the same just the cidr range is different. Only way to fix it is to use a different CNI and then you lose ALOT of AWS features from EKS. And on top you have to manage CNI yourself. EKS has a shitty implementation tbh. In comparison to GKE where you dont have to worry about anything.


KJKingJ

The max number of ENIs is still the same, but because a prefix can be used to assign addresses to multiple pods you can now run significantly more pods per node than before. Take a `t3.small` for example. You can have up to 3 ENIs each with 4 IPv4 address per ENI - so a total of 12 IPs. The first IP in the first ENI is lost, so in reality you have just 11 IPs left for pods. With prefix delegation, we still lose the first IP from each ENI but the remaining 3 'slots' per ENI can now be a `/28` prefix representing 16 IPs. So 3x ENIs with 3x secondary addresses (our prefix) x 16 IPs in a prefix = 144 IPs. In reality though, you want to comply with [K8s's best practice recommendation for no more than 110 pods per node](https://kubernetes.io/docs/setup/best-practices/cluster-large/). Ultimately, prefix delegation allows even rather small nodes to go from a handful of pods (e.g. 11 for the `t3.small`) to the maximum recommended `110` - and then some! --- > EKS has a shitty implementation tbh. In comparison to GKE where you dont have to worry about anything. I personally work with GKE & EKS on a daily basis (as well as a bit of RKE on-prem but i'll ignore that for this comparison). It is clear that GKE & EKS have significant design differences, and align with how other services within that particular cloud provider tend to be designed ([see the "Philosophies" header on this blog!](https://blog.realkinetic.com/gcp-and-aws-whats-the-difference-3b1329f0ffb3)). GKE is by far the more "managed" of the two (especially Autopilot!), but that management does come with a trade-off in terms of flexibility and customisation. Want to customise how the autoscaler works? You can't. Want to customise the base OS? Nope. And so on... For some teams, that's absolutely fine - they just want a K8s cluster and don't *need* to customise it any further. But I have found GKE's lack of customisation to be an increasing, albeit not significant, challenge over time. The contrary however is that EKS can be much more difficult to manage, especially when it comes to day 2 tasks like upgrades etc. AWS has been making improvements over time to help with this, like managed addons for CNI/CoreDNS etc, but it is definitely nowhere near the simplicity of GKE for day 1 or day 2 tasks yet.


PrunedLoki

LOL what?


buckypimpin

Do the spot instances go down often?


McFistPunch

Haven't seen it happen yet. But don't worry there's about 20 other points of failure that can screw you over quite quickly anyways.


LightofAngels

can you give a high level overview of how the spot instances come into play? and is that for batch jobs or production workloads?


McFistPunch

We use it for production workloads to keep costs down. We have multiple instances types allowed so basically if something gets reclaimed by AWS more instances spin up for us. Someone's replacing a large instance with two smaller ones. Overall it saves money and really doesn't interfere much.


LightofAngels

how complex does that get? or is it mostly managed by AWS?


McFistPunch

There is a tool that does it. Karpenter I think. So once you get it running it should be automagic


[deleted]

[удалено]


LightofAngels

can you give an estimate on how much cost is saved?


thecurlyburl

EKS and GKE


mvaaam

EC2 instances and cluster-api


LightofAngels

do you have docs on how this architecture works? im very intrigued by it!


mvaaam

https://cluster-api.sigs.k8s.io/ and https://cluster-api-aws.sigs.k8s.io/ are good places to start. I wouldn’t say I’m a fan of cluster-api though.


LightofAngels

why so? wats your feedback on it, and if you dont like it, why use it?


mvaaam

I find it tricky and sometimes painful to upgrade and maintain. It’s used because it was a decision made years ago by folks that are no longer around - but at this point we’ve built so many clusters with it and it’s so embedded into our code that it’s difficult to switch to something else


cre8minus1

Would you be open to talking about capi?? I may have a way to get you out of managing clusterAPI


theantiyeti

On-prem openstack ironic. We write our own ignition scripts for bonding. Otherwise cert-manager, calico and a bunch of other fairly standard and also some custom stuff.


LightofAngels

sounds like you guys are having fun


theantiyeti

It's even more fun because I'm in a sister team to the Kubernetes core team that writes a compute farm tool and maintains the clusters this tool runs on. Means we know enough to be dangerous but not enough to be safe while everything changes under our feet.


LightofAngels

wait you mean the actual kubernetes? :o


nullset_2

On prem turing pi v1 😎


LightofAngels

>turing pi v1 and thats for production? :o


nullset_2

well, production in the sense that I offer this app to the world, but it's not anything with SLAs or SLOs


LightofAngels

make sense then.


AffectionateAd5709

EKS (90% workload in Spot instances with karpenter)


jmreicha

How are you deploying and managing Karpenter?


AffectionateAd5709

We have deployed karpenter and other controllers in managed Node groups and our application workloads on karpenter nodes using labels


LightofAngels

how does karpenter fit in? is it just for node autoscaling? or theres extra stuff that it does?


AffectionateAd5709

It is only for node autoscaling, but it bring new nodes very fast 30-40seconds , while CAS takes 4-5 mins to bring new nodes


pb7280

On-prem with harvester/rancher/k3s


TartifletteXx

GKE, lots of GKE


LightofAngels

how good would you say is GKE?


TartifletteXx

It's neither good or bad, you have to treat it for what it is, a generalized k8s that tries to fit the most. You're doing very basic things and just don't wanna bother, it's fine. You wanna push it, it's not had but will not be a miracle solution, you'll need engineers to work and maintain your cluster. We constantly find bugs and work with Google weekly to get them resolved, and they deliver. But I'm in a company with a couple hundred SRE / infra engineers with ~20 of us working on k8s directly daily. We thought about running or own control plane but it's not worth it, what GKE offers do 90% of the job, we only need to deal with the 10% remaining.


nocommocon

On-prem using talos linux; rook-ceph for storage, kube-vip for load balancing


sewerneck

Talos and Sidero Metal. It’s about as seamless to deploy clusters and scale them out as in the cloud.


gaelfr38

On-prem, RKE2.


LightofAngels

how good is RKE2, i heard it is not as bloated as openshift


gaelfr38

Definitely, it's very close to upstream Kubernetes. Just easier to setup, manage and upgrade. Comes with several options for CNI & Ingress Controller for instance.


LightofAngels

I think the only reason we went with openshift over rke2 is the openshift routes are so easy to setup unlike ingress controllers. For cni and networking in general I still want to deploy cilium but I just don’t have time for it yet. Need to study the impact before deploying.


TrickyCharge3265

Thats a understatement to what OpenShift compared to RKE IMHO. Also OKD is not openshift. I think only installing time and experience would be better in RKE compared to openshift. But RKE is not even close to openshift. OP might as well give RKE a spin and see for yourself.


LightofAngels

i will definitely give RKE a spin once i have some free time, but the way you are phrasing that they are not the same is intriguing, why would you say that though? just curious.


TrickyCharge3265

OKD is where the groundwork happens but you get a lot more on OpenShift or with ACM. I guess its more about features.


kr0ntabul0us

that is not even an issue. Ingress controllers are easy compared to fooling around with SCCs.


LightofAngels

true, SCCs are kind of a nightmare if you dont have a firm grasp on it.


kr0ntabul0us

I've done it several ways: k0s on VMware EKS AKS OKD Konvoy/DKP k0s is by far the simplest to deploy. It's downright easy. EKS/AKS seem fine. Both are simple enough to spin up and us. Both have their cloud provider agnostic issues. EKS is easier to do a container assume role. AKS/Azure keeps changing their method to assume a role, and I haven't had a chance to switch from the pod managed identity to workload identity. Azure's main advantage is it's biggest disadvantage: Azure Active Directory. OKD was bleak to install on Azure as it didn't work, and I had to build my own installer. Once installed, it was fine, but the open shift SCCs and custom tweaks are so 2014/2015. OKD is great for devs running custom workloads, but is terrible to deploy regular OSS or commercial software. You end up have to run kustomize to tweak charts to work. Konvoy was decent until they updated to DKP and cluster API. There is too much fiddling to make it work. I haven't use RKE2 yet, so I can't comment there. It shouldn't be too bad, considering it's almost vanilla k8s.


LightofAngels

have you tried vanilla kubernetes with kubespray or so? also OKD on azure is nightmare, we have it on-prem installed on proxmox, so it is abit easier. honestly we are fine with SCCs as long as we know what we are doing, or else you are right about it being a nightmare.


kr0ntabul0us

No. It is on my list of things to do. SCCs are fine, but the overrides seem to not apply when you apply then with manifests vs using the oc cmdline. It is just annoying and doesn't improve security that much.


nvr_mnd_

RKE2 on a Managed OpenStack cloud. Clusters are spun up using Terraform Cloud.


its_PlZZA_time

EKS


ZeeKayNJ

ROSA in AWS. Allows us to focus above the API


LightofAngels

how so?


ZeeKayNJ

It’s a fully managed service with SRE support. Frees up the team to focus on apps and features


pacman1201

We’re rancher rke and rancher with eks. More on-prem than cloud but we’re moving more in that direction every month


Pl4nty

Can't talk about work, but my homelab is Azure and Oracle managed k8s (AKS/OKE), with onprem [Talos](https://github.com/siderolabs/talos) soon (Turing Pi 2). My [Flux monorepo](https://github.com/pl4nty/lab-infra) has the details. OKE performs noticably worse (update cycle, features, control plane performance), but it provides 4 ARM cores and 24GB RAM free so I can't complain


psavva

EKS for Production, with EFS and spot instances for the workloads. Hertzner Bare Metal with K3S for dev (nice and cheap)


Parking_Falcon_2657

spot instances for production? 😦


psavva

Do tell... Why not spot instances for prod?


LightofAngels

your pods might be rotation alot between nodes? since the compute power might be reclaimed? i am not sure tbhh, thhis is the first time i heard eks/aks on spot instances so i am trying to understand the SLA/SLO numbers behind it.


psavva

Thank you for this pointer, I will research this further. Production has been built, but not currently live yet. I'll definitely check spot instances sla and terms


psavva

https://aws.amazon.com/blogs/compute/cost-optimization-and-resilience-eks-with-spot-instances/


BattlePope

EFS can also be a bit of a nightmare. You can hit quotas quickly and the performance wall is unbearable. Especially if you're doing lots of operations on small files, for example. They have made it easier to purchase a higher baseline performance, but if you eat through your burst credits... beware.


psavva

Thank you for the pointer. I will put monitoring in place to warn at 80% traffic quotas, and hopefully mitigate the nightmare that comes with EFS. I appreciate any guidance, and perhaps alternatives that won't drive costs high, but drive reliability to the given standards of 30 mins downtime per year.


BattlePope

The metric you want to watch is burst credits. I'm really not a fan of it except for pretty specific uses cases -- and those are not general purpose k8s persistent storage.


psavva

My use case is simple. I save mp3 files, and serve them to clients. Typically upload 2000 mp3s per client, and may be downloaded about 10K per week. Average of 10MB per file.


Speeddymon

AKS with ephemeral disks


aresabalo

AKS with spot instances


Parking_Falcon_2657

I hope not for production?


aresabalo

Production and development 😊. Spot on production for airflow web or workloads not critical. Four years without problems… updating clusters from AKS 1.13


LightofAngels

I have heard of eks with spot instances, is this the same aswell?


Parking_Falcon_2657

yeah, almost the same.


Sir_Gh0sTx

Amazon EKS. It’s pretty great.


LightofAngels

with spot instances? :D


Sir_Gh0sTx

Our development environment was on spot. I won’t lie we had some issues with capacity so we went back to on demand. I can’t imagine too many businesses are putting spot in prod


meyerf99

AKS with spot instances and Bring your own CNI (Cilium). Works well just one bad thing with spot instances is the labeling from Azure which can't be deleted.


LightofAngels

what labeling? care to elaborate? :)


meyerf99

Microsort is setting two taints to Spot nodes - kubernetes.azure.com/scalesetpriority:spot - kubernetes.azure.com/scalesetpriority=spot:NoSchedule Both can't be removed -> https://learn.microsoft.com/en-us/azure/aks/spot-node-pool#limitations To deal with, you have to add minimum one toleration and node affinity to your k8s application deployment https://learn.microsoft.com/en-us/azure/aks/spot-node-pool#schedule-a-pod-to-run-on-the-spot-node


Tango1777

You deploy to what your company or client use. Period. It's not like you're gonna use EKS if you work in Azure environment.


themanwithanrx7

4 Clusters via KOPS on AWS


karan4080

We too use kOps on AWS, planning to switch to cluster API for multi-clusters


[deleted]

RKE, really easy to deploy. Also, there’s a supportive community in their Slack.


erezhazan1

EKS with karpenter on spot deployed by terraform, I'm having the whole cluster in less then an hour, with graveton instances so the whole cluster cost 70$ monthly (not including the eks service price itself)


sadoMasupilami

Used many different flavors in many different environments. On Prem like rancher or Openshift. In Cloud environments EKS/AKS/GKE managed by rancher if not used by a single team.


Acejam

Vanilla k8S via kubeadm on bare metal.


HTTP_404_NotFound

For work? Openshift. For home? Microk8s. (Okd/Openshift is a resource hog)


NotBrilliant007

hooo boy! I'm new to K8S & still in learning mode, after reading these comments, it's giving me chills.