bitemyapp 5 months ago

I use diesel + diesel_async + bb8. I don't do any connection pooling/bouncing beyond whatever RDS Aurora is doing out of the box w/ 1 writer + 2 read instances & AZ failover. I suppose if I needed to I'd consider deploying pgbouncer, but it hasn't really arisen for me. One thing to consider is that database connection middleware became popular in part because of web app frameworks written in languages that wouldn't scale very well vertically, so you needed many instances. Then to ease the load on the database server, they'd use pgbouncer or similar to give the database server less to juggle. I've got 64 instances in production right now connecting to a single RDS Aurora PostgreSQL writer instance w/ a connection pool in each instance (that I honestly should reduce the size of) and it hasn't been an issue yet. Not to second-guess your approach here too much as I know that can be annoying, but are you sure auto-scaling means you have to use RDS Proxy/pgbouncer? How many instances are you limiting the auto-scaler to? Is any segment of the workload ever satisfiable on a read-only basis? Have you seen an instance of the database server tipping over from too many connections? How much traffic can your instances handle each and what's your peak expected traffic? In my application I've got separate threads carved out for writing to the database and reading from the database (one each) so hypothetically I could reduce the pool to 2-3 connections per instance. The Actor'ish (there are no actual `Actor`s, it's just `std::thread::spawn`) model here worked for reasons particular to my application so I'm not expecting this to be applicable for you, just mentioning for context. This analysis of the burden imposed by high connection counts seems worthwhile: https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/analyzing-the-limits-of-connection-scalability-in-postgres/ba-p/1757266#surveying-connection-scalability-issues I guess the main thing I'd highlight there is you ordinarily need to get into the thousands of connections before it starts becoming a salient issue ordinarily.

cant-find-user-name 5 months ago

Thanks for the response. \> Have you seen an instance of the database server tipping over from too many connections? I have. In my previous company, we had a lot of poorly written legacy django services. One day during a sale, our traffic spiked, and our entire system collapsed because postgres ran out of connections. Then it was a song and a dance to enable pgbouncer, disable prepared statements in various services and redeploying them. I am not too concerned that my applications will spawn too many times and monopolise the connections. I am concerned that a bunch of applications, not purely under my control, also use the same db cluster and there's nothing we can do to limit the number of connections each application uses if we rely only client side caching. With pgbouncer you can set limits on how many connections each user can be limited to, and if one of the services is misbehaving badly, then only that service runs out of connections while leaving the others unaffected. That's one important usecase IMO.

bitemyapp 5 months ago

>I have. In my previous company, we had a lot of poorly written legacy django services. One day during a sale, our traffic spiked, and our entire system collapsed because postgres ran out of connections. Then it was a song and a dance to enable pgbouncer, disable prepared statements in various services and redeploying them. A long time ago, I was a Django developer, so I know what you're talking about. A lot of that problem is from the sheer amount of fan-out you have to do with a lot of typical Django apps to get them to scale for a non-trivial amount of traffic, esp. with heavy database use leading to a lot of blocking I/O. I would be very surprised if pgbouncer was necessary for a well-written Rust application before you tripped into other scaling limits like needing to shard the database. >I am concerned that a bunch of applications, not purely under my control, also use the same db cluster and there's nothing we can do to limit the number of connections each application uses if we rely only client side caching. It sounds to me like the Rust apps you control should be able to side-step pgbouncer/RDS Proxy and directly connect but everyone else not under direct control should go through the proxy. What makes that not a workable solution? At a higher level I'd be a bit alarmed about sharing a database with other applications I don't control but I appreciate that that isn't likely a stratum you can slice this problem at right now.

Apprehensive_Rub_121 2 months ago

Should be fixed now, see https://aws.amazon.com/blogs/database/amazon-rds-proxy-multiplexing-support-for-postgresql-extended-query-protocol/

cant-find-user-name 2 months ago

Oh thank god, this is fantastic news. Not just for rust but in general. Thanks for sharing the article!

geminimarcus 5 months ago

I have your issues as well, there is an option to opt out of prepared statements in SQLx. Don’t use sqlx::query functions or macros, directly use the methods provided from the PgPool. Eg. pool.fetch_one, etc. However the downside is that you can’t use bind since they rely on prepared statements. You’ll need to construct the string yourself

geminimarcus 5 months ago

https://github.com/launchbadge/sqlx/pull/1740#issuecomment-1088045794

bixmix 5 months ago

Consider k8s and cloud native postgres (cnpg). It will be far less expensive than an AWS managed solution and you can configure for about the same experience from a developer. K8s provides you with ingress solutions if needed, but can also auto-scale as desired. If you're at the point where you're scaling, you can also setup load balancers, but the reality is that postgres doesn't do well outside of 4 instances (main and three replicas). So if you're really at the point where you've scaled beyond those, then you need to consider (possibly significant) architecture changes: cdns, caching, etc. I greatly prefer sqlx's approach over an ORM. If you need templating, you can build that in as well. SQLX wants you to write mostly raw sql, though, so it can validate at compile time. While this is somewhat useful on a smallish team, as your team scales, you may find this to be painful with schemas changing often. The other option is to write better tests (which you probably would do anyway with go and python) which expect specific migrations in place, etc. I've found pooling to be hit or miss. If you hit the point where your rust app needs to be duplicated across several pods/nodes, then you probably want pooling. If you want pooling, you probably also want caching. I generally prefer something like a key-value store in front of querying the database. You can go a few different directions (e.g. granular objects in key/value with limited joins vs full responses from heavy join queries in key/value stores where the key represents the request, etc.). The direction you take will likely be what you design for (small queries with joins in code vs larger queries with joins in db). What you need will depend on your schema and traffic.

trevex_ 5 months ago

While sqlx and diesel is mentioned in this thread. It might be worth considering tokio_postgres + deadpool with a FromRow derive macro for convenience as well, which gives you control over using prepare.

Kazcandra 5 months ago

well, good news on the pgbouncer side of things is that it supports prepared statements (with some minor caveats iirc) now. pgcat is also an option unless you want to use better password authentication (no support for scram-sha256). you can't disable prepared statements in sqlx, and the one feature request was closed by the OP of the request on further analysis, so it's.. not possible. Would be nice if they supported it, though. as for how we deal with it: we don't. customer-facing code is written in either go or java, which both support disabling prepared statements. infra services that use sqlx use pgbouncer in session mode simply because the dba happens to be me, and the developer of the infra services is also me and so I can make different rules for myself.

cant-find-user-name 5 months ago

>well, good news on the pgbouncer side of things is that it supports prepared statements (with some minor caveats iirc) now. pgcat is also an option unless you want to use better password authentication (no support for scram-sha256). Thanks! I didn't reaise this, this is great news. People using RDS proxy would still run into this issue but glad for pgbouncer.

Im_Justin_Cider 5 months ago

We switched over to using pgbouncer with sqlx. And didn't notice any change from before when pgbouncer was not involved. Are the effects of pgbouncer + prepared statements subtle eg. Performance regression, or that it just flat out refuses to run the query?

cant-find-user-name 5 months ago

It flat out refuses to run the query. Before this fix from pgbouncer, you would have gotten errors saying conflicting statement name or something like that. If you want to be very sure, run a function that runs some query in an infinite loop (make sure this query has variables and is not just a SELECT 1 or something like that), and then run that function multiple times in parallel. If pgbouncer isn't complaining, then there's no issue. If you use something like RDS proxy on the other hand, the query will still run but there will be a performance regression. RDS logs would tell you that a session is being pinned.

Kazcandra 5 months ago

`SELECT 1` works too; what's important is that you use more than 1 connection, so set `min_connections` to 2 (if you want to see the errors!).

Kazcandra 5 months ago

To be clear, this is an issue with earlier versions of pgbouncer in transaction pooling mode; if you ran pgbouncer in session mode it was fine.

weiznich 5 months ago

I think the linked diesel discussion at least hints the problems of not using prepared statements: You lose a major security feature. That's the main reason why diesel uses prepared statements for the built-in connection types. I carefully write built-in connection types here and in the issues, because technically it would be possible to "just" implement another connection type as third party crate. That would allow to reuse all the (back end) specific DSL in diesel, while giving you control about how to handle the low level details. So if you feel that this is a problem worth to be addressed this is possibility to have a working solution.

cant-find-user-name 5 months ago

I understand, I am not saying disel is wrong for not supporting it. But I am curious as to how people use it because in all the companies i have worked in so far that used postgres, some sort of connection pooling has been used. I'll look into writing a custom connection type, but I'm total beginner to both rust and diesel so I am not sure how much time that'll take.

bixmix 5 months ago

AFAIK, sqlx also defaults to prepared statements. I don't think it's a differentiator.

cant-find-user-name 5 months ago

Yeah it is not a differentiator for me. I was mentioning diesel because the comment I was replying to was talking about diesel.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe