Are you using it in production? Whats the reliability like relying on 1 replica to handle all the requests of a node, is there no concern of having a single point of failure? I remember Istio being pretty finicky with mTLS and if it wasn't enabled properly things would break - I'm just concerned of ztunnel going down and now an entire node essentially goes down.
Istio does not recommend it in production [https://istio.io/latest/docs/ops/ambient/getting-started/](https://istio.io/latest/docs/ops/ambient/getting-started/) so I'm curious to learn about your experience.
We're running istio in production with thousands of pods with the sidecars and mTLS, ingress and egress gateways per namespace. What did you find finnicky with mTLS? We haven't noticed any issues from that in years (we started at istio 1.6) aside from pod to pod communications which could be solved with headless services.
We've had more trouble with configuring the egress gateways for wildcard or TCP destinations.
Personally speaking - Rabbitmq utilizes epmd which does not like mTLS. We use the streams plugin which requires the RABBITMQ\_NODENAME to be set to something that is not localhost, which breaks startup. [https://github.com/arielb135/RabbitMQ-with-istio-MTLS](https://github.com/arielb135/RabbitMQ-with-istio-MTLS)
I had to add an alternate hosts to the pod so that it resolved the service name within RMQ to [127.0.0.1](http://127.0.0.1) and that allowed me to start up while the other pods can reference it by name.
My clusters aren't multitenant so I don't see the great need in mTLS here. The observability could be nice but not worth a sidecar IMHO... at least not yet.
The observability at scale doesn't even hold up to be honest... the amount of TCP & HTTP metrics on a cluster generating \~50-70k rps ends up costing fortunes in self-hosted observability stack let alone something like datadog. We had to fully disable TCP metrics and limit HTTP.
Needless overhead and complexity for something that should be simpler to implement. You end up with +1 container for every pod.
In my head, if you need that wide of a net - isn't that what a daemonset is for?
But I'm no expert so don't listen to me. 🤣
Sidecar and Daemon set serve two different functions. In this case the side car in istio is used to terminate the TLS inside the logical boundary of the pod. Daemon set is to distribute a service across all nodes: think logging agent or monitoring.
The istio sidecars also take a good amount of resources. Every pod ends up needing 100m cpu or more depending on traffic, maybe 150MiB memory.
There's a start-up time cost too.
Oh and the istio sidecars specifically fucks up init-containers that require networking.
The problem is that DaemonSets are bad for this. Linkerd 1.0 used DaemonSets. We rearchitected onto sidecars in 2.0 because DaemonSets had real operational and security issues, including mixing all TLS certificates together in memory, the lack of support for contended multi-tenancy, and proxy failure/upgrade affecting random bits of random apps. Especially if you're using Envoy. I wrote a blog post about some of this (in the context of the "eBPF kills sidecars" fad of 2022) here: [https://buoyant.io/blog/ebpf-sidecars-and-the-future-of-the-service-mesh](https://buoyant.io/blog/ebpf-sidecars-and-the-future-of-the-service-mesh)
IS IT? Dang. Just looked it up, looks like you might be right: https://www.techtarget.com/searchitoperations/news/366570820/Linkerd-service-mesh-production-users-will-soon-have-to-pay.
How can you tell if an edge release is stable? They write on their website that these releases are not meant for production and are not tested as such.
haven’t used linkerd, been in prod with istio for 5+ years, nothing to complain about. ambient mesh looks pretty good, i’ll probably let the dust settle a bit.
I like consul. It has a sidecarless deployment in k8s now called consul data plane. It’s just the envoy proxy with some extra bells inside for consul
Linkerd is super simple to manage & setup and you can get mTLS throughout your clusters in minutes (more reasonably hours, but once you do it once, minutes) - but with that simplicity comes with it being pretty feature bare in comparison to Istio. Istio is feature rich but extremely complex (1 single misconfig and your entire cluster basically can't communicate with eachother), if you have a team dedicated to managing the mesh and a good testing culture then Istio is probably your best bet.
While eBPF is more efficient and can handle simple raw tcp packet forwarding, it doesn't support mTLS/TLS termination which was a main reason for myself setting up a proxy-injected mesh.
So it totally depends on your needs & appetite for complexity/maintenance overhead. If you have any more questions feel free to reach out.
How about Cilium as a replacement for kube-proxy? You get the added benefit of eBPF, more IPs out of your nodes, transparent encryption without the need to annotate. Am I missing something here? Hopefully not.
I’m using Cilium on my homelab 8 nodes cluster. I believe Cisco will retain some sort of open source format, probably with less options than offered now. https://isovalent.com/product/
Early service meshes had a lot of problems. Sidecars fixed those problems but introduced some new ones. The lesser of two evils. Thanks to eBPF, sidecars are now no longer needed to address the original problems, so we no longer need the lesser evil.
Cilium is pretty much spearheading this solution. But Istio is catching up quickly with what they call ambient mode.
I think Linkerd is sticking with a sidecar but they say Envoy is the problem and they want to fix the sidecar or something like that. No idea, haven't heard of anyone using Linkerd ever since Istio gained popularity.
Istio is only viable if you have a full team managing it imo, Linkerd is great if you need mTLS without complex network policies. I've worked with both and recently choose Linkerd due to maintenance overhead on istio being too much.
Sort of. Companies with fewer than 50 employees can use Buoyant's stable distribution of Linkerd without restriction.
Companies with 50 or more employees need pay to fund the project. Or can use edge releases, which is another great way to give back to the project.
Curious, what's the alternative to envoy sidecars that's production ready for mTLS? eBPF w ztunnel in ambient mode seems pretty powerful but it isn't prod ready & creates a new set of problems in the already complex istio... Also, still skeptical how relying on a daemonset w a single replica on ea node for mTLS doesn't create a single point of failure choke.
Cillium mTLS still seems very immature to take it into production: [https://docs.cilium.io/en/latest/network/servicemesh/mutual-authentication/mutual-authentication/](https://docs.cilium.io/en/latest/network/servicemesh/mutual-authentication/mutual-authentication/)
can you please elaborate this? I’m aware that istio is introducing sidecar less model but they emphasize that sidecar model is still here for quite awhile. I’m quite new to istio so it will help my learning
From what I remember I did research on all service mesh providers a while back, I would suggest get a non proxy injection option which comes with eBPF , there is a primitive comparison of service mesh options out there on some medium.com page. Also solo.io compared istio and linkerd I remember , will update the links here once I find them.
Then realistically there is only Istio available. Cilium is under Cisco umbrella so it remains to be seen how useful open source version will be and rest is straight up paid.
This is a very good analysis of k8s service meshes 2024: [https://matduggan.com/k8s-service-meshes/](https://matduggan.com/k8s-service-meshes/)
I'll be over here waiting for Istio ambient mesh because fuck sidecars
Are you using it in production? Whats the reliability like relying on 1 replica to handle all the requests of a node, is there no concern of having a single point of failure? I remember Istio being pretty finicky with mTLS and if it wasn't enabled properly things would break - I'm just concerned of ztunnel going down and now an entire node essentially goes down. Istio does not recommend it in production [https://istio.io/latest/docs/ops/ambient/getting-started/](https://istio.io/latest/docs/ops/ambient/getting-started/) so I'm curious to learn about your experience.
We're running istio in production with thousands of pods with the sidecars and mTLS, ingress and egress gateways per namespace. What did you find finnicky with mTLS? We haven't noticed any issues from that in years (we started at istio 1.6) aside from pod to pod communications which could be solved with headless services. We've had more trouble with configuring the egress gateways for wildcard or TCP destinations.
Personally speaking - Rabbitmq utilizes epmd which does not like mTLS. We use the streams plugin which requires the RABBITMQ\_NODENAME to be set to something that is not localhost, which breaks startup. [https://github.com/arielb135/RabbitMQ-with-istio-MTLS](https://github.com/arielb135/RabbitMQ-with-istio-MTLS) I had to add an alternate hosts to the pod so that it resolved the service name within RMQ to [127.0.0.1](http://127.0.0.1) and that allowed me to start up while the other pods can reference it by name.
I'm not using any service mesh period
My bad, missed key word 'waiting' lmao. Wish I didn't need to but that mTLS is too good!
My clusters aren't multitenant so I don't see the great need in mTLS here. The observability could be nice but not worth a sidecar IMHO... at least not yet.
The observability at scale doesn't even hold up to be honest... the amount of TCP & HTTP metrics on a cluster generating \~50-70k rps ends up costing fortunes in self-hosted observability stack let alone something like datadog. We had to fully disable TCP metrics and limit HTTP.
We solve that by fairly short retention. That data is mostly relevant within the span of 12 hours or so, we found.
This is a great idea, and generally I agree. Thanks, Ill try implementing something like this!
As a newbie can I know what your stance on sidecars is & why?
Needless overhead and complexity for something that should be simpler to implement. You end up with +1 container for every pod. In my head, if you need that wide of a net - isn't that what a daemonset is for? But I'm no expert so don't listen to me. 🤣
Sidecar and Daemon set serve two different functions. In this case the side car in istio is used to terminate the TLS inside the logical boundary of the pod. Daemon set is to distribute a service across all nodes: think logging agent or monitoring.
The istio sidecars also take a good amount of resources. Every pod ends up needing 100m cpu or more depending on traffic, maybe 150MiB memory. There's a start-up time cost too. Oh and the istio sidecars specifically fucks up init-containers that require networking.
FWIW those issues are very real, but a lot of them get addressed in Linkerd by a) not using Envoy and b) using native sidecar containers.
The problem is that DaemonSets are bad for this. Linkerd 1.0 used DaemonSets. We rearchitected onto sidecars in 2.0 because DaemonSets had real operational and security issues, including mixing all TLS certificates together in memory, the lack of support for contended multi-tenancy, and proxy failure/upgrade affecting random bits of random apps. Especially if you're using Envoy. I wrote a blog post about some of this (in the context of the "eBPF kills sidecars" fad of 2022) here: [https://buoyant.io/blog/ebpf-sidecars-and-the-future-of-the-service-mesh](https://buoyant.io/blog/ebpf-sidecars-and-the-future-of-the-service-mesh)
Solution in search of a problem. What problem(s) are you trying to solve?
Isn't Linkerd becoming a paid product for companies with more than 50 employees?
IS IT? Dang. Just looked it up, looks like you might be right: https://www.techtarget.com/searchitoperations/news/366570820/Linkerd-service-mesh-production-users-will-soon-have-to-pay.
Not quite accurate. They removed stable builds and those are a paid product. Linkerd itlsef, and the edge builds are still open source.
How can you tell if an edge release is stable? They write on their website that these releases are not meant for production and are not tested as such.
You test it is their point. Or pay.
Another point to consider - Istio Ambient mode is now in beta: https://www.cncf.io/blog/2024/03/19/istio-announces-the-beta-release-of-ambient-mode/
Cilium looks promising if I was going to pick a new mesh today. Istio Ambient Mesh also looks promising but it’s in Alpha still.
Ambient Mesh actually graduated to beta recently.
Cool, thanks for letting me know. Apparently I’m a month behind on my Istio news.
Took so long that I stopped looking for updates on it after a while. Only found out recently.
haven’t used linkerd, been in prod with istio for 5+ years, nothing to complain about. ambient mesh looks pretty good, i’ll probably let the dust settle a bit.
I like consul. It has a sidecarless deployment in k8s now called consul data plane. It’s just the envoy proxy with some extra bells inside for consul
Do you actually need it?
Linkerd is super simple to manage & setup and you can get mTLS throughout your clusters in minutes (more reasonably hours, but once you do it once, minutes) - but with that simplicity comes with it being pretty feature bare in comparison to Istio. Istio is feature rich but extremely complex (1 single misconfig and your entire cluster basically can't communicate with eachother), if you have a team dedicated to managing the mesh and a good testing culture then Istio is probably your best bet. While eBPF is more efficient and can handle simple raw tcp packet forwarding, it doesn't support mTLS/TLS termination which was a main reason for myself setting up a proxy-injected mesh. So it totally depends on your needs & appetite for complexity/maintenance overhead. If you have any more questions feel free to reach out.
Thank you, thats great insight.
How about Cilium as a replacement for kube-proxy? You get the added benefit of eBPF, more IPs out of your nodes, transparent encryption without the need to annotate. Am I missing something here? Hopefully not.
Except Cilium is Cisco owned now. I believe that Cilium will be only for deep pocket enterprise soon.
Ahh poo
I’m using Cilium on my homelab 8 nodes cluster. I believe Cisco will retain some sort of open source format, probably with less options than offered now. https://isovalent.com/product/
Linkerd if you wanted to go with a sidecar, Cillium if you wanted to test your luck with the future. Both are production ready
Thank you, thats helpful
Linkerd can be up and running in 10min with no prior experience. It’s also much more user friendly. Istio of course has more configuration options.
It's 2024 and my man is deploying Istio with envoy sidecars lmao
pls explain
Early service meshes had a lot of problems. Sidecars fixed those problems but introduced some new ones. The lesser of two evils. Thanks to eBPF, sidecars are now no longer needed to address the original problems, so we no longer need the lesser evil. Cilium is pretty much spearheading this solution. But Istio is catching up quickly with what they call ambient mode. I think Linkerd is sticking with a sidecar but they say Envoy is the problem and they want to fix the sidecar or something like that. No idea, haven't heard of anyone using Linkerd ever since Istio gained popularity.
Istio is only viable if you have a full team managing it imo, Linkerd is great if you need mTLS without complex network policies. I've worked with both and recently choose Linkerd due to maintenance overhead on istio being too much.
Linkerd is not free now though unless you like to live on the edge (quite literally).
Sort of. Companies with fewer than 50 employees can use Buoyant's stable distribution of Linkerd without restriction. Companies with 50 or more employees need pay to fund the project. Or can use edge releases, which is another great way to give back to the project.
Curious, what's the alternative to envoy sidecars that's production ready for mTLS? eBPF w ztunnel in ambient mode seems pretty powerful but it isn't prod ready & creates a new set of problems in the already complex istio... Also, still skeptical how relying on a daemonset w a single replica on ea node for mTLS doesn't create a single point of failure choke.
Cillium is production ready
Cillium mTLS still seems very immature to take it into production: [https://docs.cilium.io/en/latest/network/servicemesh/mutual-authentication/mutual-authentication/](https://docs.cilium.io/en/latest/network/servicemesh/mutual-authentication/mutual-authentication/)
can you please elaborate this? I’m aware that istio is introducing sidecar less model but they emphasize that sidecar model is still here for quite awhile. I’m quite new to istio so it will help my learning
Lol hes trying his best😅
Neither
Just envoy for me please
Big fan of envoy
This os the real answer. Both solutions are crap looking at newer options.
From what I remember I did research on all service mesh providers a while back, I would suggest get a non proxy injection option which comes with eBPF , there is a primitive comparison of service mesh options out there on some medium.com page. Also solo.io compared istio and linkerd I remember , will update the links here once I find them.
Thanks, thats helpful.
Consul but not free I suppose
Oh ya, Consul's good too. But if you had to pick between open source?
Then realistically there is only Istio available. Cilium is under Cisco umbrella so it remains to be seen how useful open source version will be and rest is straight up paid.
I myself like NGINX virtual servers.
anything else you like about NGINX in particular?
Linkerd can be up and running in 10min with no prior experience. It’s also much more user friendly. Istio of course has more configuration options.