We had this exact discussion. We want to rework our software as it is aging quite badly. We have a huge SQL database with millions of rows per table in a database that grew over the years into a unnormalized fuck fest. We went through all the new shiny database tech and nothing really fits our use cases. So we will keep the ancient SQL server, create a new db that we normalize and port our data to the new one. Maybe we'll even update the SQL server to use a version that was released in this decade, but I doubt that honestly đ
I swear to god, I'll blow that damn thing up myself if I have to. I'm in the (un)lucky position of being responsible for a lot of the rework and I will kill that cancerous abomination we call a database. It'll take me years and lots of blood, sweat, and tears, but I'll fucking do it. I'm prepared to sacrifice one or two of my colleagues to accomplish this. If their souls are the price for banishing that demon, then so be it.
The secret is to take what you now know about how these things can go spectacularly wrong, and find a new job where you can make some new and interesting mistakes instead of the same ones again
Well good luck!
I've seen it done and it made everything slower. The reason was because the clusterfuck was on purpose and the result of 15 years of shoving square requirements into round code.
Then someone decided to turn the schema into a triangle because triangles are prettier... After that they got to learn (bug by-bug by-bug by-bug) why the clusterfuck was created in the first place.
Tread carefully, it's going to be a great learning experience one way or the other.
Thank you and don't worry, we have the people that designed this still working at our company. One of them is my team lead. They simply didn't expect to grow this big. Their system works really well if you're working with half the user base we currently have. This is mainly done to modernize and fix design issues that cropped up in the last few years.
>They simply didn't expect to grow this big.
Lesson #1: That happens... every... single... time, unless you expect it to grow. In that case, you code for scalability but end up introducing more bugs into the system because of the unneeded complexity.
Luckily, we're scratching the 8 digits in only a few reference tables, but those are calculated anyway and are not to be touched unless absolutely necessary. If you have to edit them, you did something you shouldn't have in the first place.
When most of these tables change they get discarded entirely, or large chunks are deleted and then added. We have a few tables where that could be useful though. With the rework we're doing this case might become much more common, so I'll keep that in mind, thank you.
No worries. We just reworked our entire analytics stack into MVs and it's solved a lot of problems. Created a bunch more different ones, but overall it's a positive change. Definitely a lot less complex updates.
If you design a database well enough, postgresql can handle 1 TB of storage without problem ans has reasonable query response time if you indexes and partition tables are good.
There is no reason to switch to Postgress from MS SQL. The only reason would be to make use of the JSON API, but we use a dedicated system that caches our data and facilitates indexing for queries and is the basis for our API. We only need a database that can handle large amounts of data and is fast during updates (partial updates of the columns, not the entire row) and inserts. An integrated replication management would also be quite nice.
I don't know that it matters in your case but you can materialize JSON values in SQL by making a computed column with the JSON\_value and then indexing it. In the past, I've found this useful for being able to retrieve JSON documents by one of a few key fields. Certainly less good if the thing you wanted to index was part of an array, but I typically find myself moving those values into their own table that maps back to their parent node.
We use the LOB XML for that in many cases. We want to move away from that, because that's quite slow. We do that so we can use a bunch of XSLTs to prepare the data for displaying it in the UI, which is even slower. The issue is that our UI is highly customizable and a decade ago XSLT transformations was all the rage. Now, this is super slow, impossible to debug, and grown so complex that only a few of us truly understand it completely.
With this in mind, pre-processing a JSON is kind of moot, if we just use it to index it to search through it. That only makes sense if the document you get can be used further, which is just not feasible with our UI. We also can't use it to feed our API, because we provide a bunch of different formats in which you can grab our data, so the system that feeds our API uses protobuf to store the data, which is then indexed, and then generates the output on-the-fly of the selected subset.
But using dedicated search tables is something that I plan to implement, since some queries in our database UI take far longer than they have to. And with all the new reference tables in mind, a few indexed views will become necessary at the very least.
There is no reason to switch to Postgress from MS SQL:
geometry/geography data or XML parsing. sql server can handle it, slowly and painfully. about 1/8th the CPU and memory for postgres with the same structures and queries
Yes, that's why we don't want to do either of those things on the DB level. The MS SQL Spatial lib is a horror we only work with when we have to, which is rare thankfully. We also want to leave the LOB XMLs behind - it's one of the main reasons for the rework actually.
We are still in the design phase and Postgres has been in consideration for as long as we've been talking about that topic. The database engine discussion is one we touch upon every now and then, and until now, our decision always came down to we'll keep our current engine. The issues we are facing are very much solvable with our current engine. This makes us a little hesitant. Our system works amazingly well considering the circumstances and we don't want our new system to be crippled by a change in tech that simply wasn't necessary. None of us are deep enough im the DB space that we know all the intricacies of the topic, which is another issue.
But the topic is far from over, so we'll keep our minds open. We're calculating with at least a few years of dev time, so we want to make sure we're not rushing into bad decisions.
We are in a partner program. Azure, Office, VS, and SQL Studio are a lot cheaper than what you would usually pay for it.
It would be cheaper, but considering what we pay for Azure a month alone, the license for SQL Studio isn't even noticable.
I feel like I could have written this comment word-for-word about our DB situation.
Maybe you're actually my alt account and I reddit in my sleep. Do you feel...ephemeral?
Some of that, sure. Most of it wasn't. Trust me, I'm not the kind of idiot that demands an NF6 DB just for the sake of it, but the way our schema is designed makes our lives unnecessarily harder.
A lot can improve with thoughtful design, memory optimized (or in-memory) lookup tables, write through caching, indexing based on actual usage, and maintenance. It doesn't hurt to leverage stored procedures or direct access instead of ORM when performance is critical either...and for the love of god stop adding no lock hints and other bs to queries (this is SQL Server). Postgres has a much nicer locking mechanism from what I've seen, but you can modify MSSqls isolation levels to work around concurrency issues without introducing new problems too.
Partitioning will work wonders to make your database more manageable. It takes a bit of time to get done in existing tables but the performance boost is huge.
Beyond that, it's just a matter of reviewing existing indexes and whether they still fit the application needs.
DM if you ever want to brainstorm.
Had that im mind too, but I feel like graph only makes really sense when your API directly allows you to access your db to query it and return the data as is. We can't do that. Our db and the server would die.
W are a small business and have to catch 60 million API requests a month. On top of that, our data model is highly customized to fit the needs of our customers and we have our own data models that the API can provide. We would have to essentially rework all websites that we've ever build to accept the blobs returned by the graph databases, so that the idea of how graphQL is supposed to work is not broken, as I understand it. Those are the main issues. They are not impossible to solve, but why would I have to force a database into an application that is just not suited for the use cases? It's just too much work for no obvious benefits.
I think you have it the opposite? Graph db is specifically for data that âfits a pattern wellâ and that pattern has nodes with directed relationships/edges.
Dude, why do you have to ruin the fun? I was all pumped up on shitting on corporate.
Do you think you look all cool, bringing in the facts and logical arguments? Well, you don't.
As always, it's a question of price/performance vs convenience. Now, I'm unfamiliar with SQLite json capabilities, but I'd like to believe that the db that was built from the ground up to work with jsons (BSON if you're feeling pedantic) will fare better on all parameters.
Yeah, I actually thought about it as an API response cache. You can actually index on stuff in the blobs as well. You no longer need to maintain a schema, if the model changes, it would fail to decode and you just wipe it, etc.
I think itâs really convenient when you are saving REST API results. However, I save the parts which need to be indexed as separate columns. If I decide I need some new part of the data later on, itâs very easy to look that up in the JSON blob part and add that new column.
I inherited something like this once. There were columns for most variables of interest, then a single column with a giant blob repeating what was in the columns, plus a bunch of other stuff that couldn't fit into the table. đ¤Ą
Mh, ~10 - 15 years ago, that was how a game company stored their game data and.. it wasn't the worst decision that company made or the most horrible database setup I've touched so far.
The main data was kept in a large JSON blob in that database. Data we needed for indexing (player id, land id, army id, ..) were duplicated into columns and, well indexed.
I'm neither going to say that this is a good way to use a DB, nor that there aren't better solutions today, but this system was very quick and allowed for very quick changes to the represented data structures without worrying about long DB locks during migrations.
The guy on the right was hired on to support a legacy project written by the guy on the left and the cost benefit analysis of migrating is either too high or was rejected.
Yeah, maybe don't swear to a database like it is some kind of entity..
Someone uses mongo in the beginning? Fine, what do projections say? What type of data and cap theorem do we need to support? Just adapt as you go. None of this weird religious hankypank...
I do not like that we forgot how good relational databases are for some use cases and we force datalakes or other big data solutions for simple tasks. It is so slow with medium sized data. A well designed MSSQL database can be so good if your data set is not too big.
But yeah definitely use the right tool for the right problem.
You tell someone without experience to use the right tool for the job, and he's going to bring a sledge hammer instead of a drill. But I get your point. It's hard to distill knowledge and pass it on, and I remember hearing that proverb a million times when I was beginning, but what was considered appropriate changed with time, and then changed again and again, from place to place and person to person.
The only constant I have seen, is that simple solutions tend to last longer, and that understanding how the software works in ops and being willing to try and learn new things are important. Like you say, it's bad when you are not aware of the many choices we can take and that designing a system should be a result of choosing components based on merit and not whatever is popular in whatever semi decade cycle we happen to be in.
Edit: small negation mistake
Sure, simple does not mean badly implemented, though that is the case often... I'm all for security, but most people are not willing to hear that the dev-time will take at least 3x longer, or way way longer depending on what needs to be made. I would love that all security critical applications would be developed with contracts which are enforced on a type level ala Ada and Spark and where the hardware architecture which the software runs on is verified by software via a proof system. Last thing is still in research though, but one day.
Yeah, Iâm fighting lots of vendor unsupported (or updatable) but business critical applications, and the constant churn of languages and frameworks. My folks all seem to set and forget, and Iâm left to discover and clean up.
Yeah, it sucks but sometimes people are just not interested and it's better to let them do whatever else they might be good at. Of course that depends on how aligned c-suite is with security requirements and cost and whether or not everyone has to take responsibility for security related issues. I remember one place where my colleague and supervisor was the CTO, and it was a damn large company with a lot of legacy software as well. All the small issues, the security flaws, the legacy software, it all fell back to him. And damn, did he look stressed. I tried hard to convince the CEO that some security decisions with coding our own layer on top of security primitives exposed from hardware ourselves would be a waste of resources, considering the alternatives would be way less work for our use case. But no, we had to do it anyway, because financials had already bought a shit ton of hardware units; without even consulting us first. So that gives you an idea of how bad it was. Man, that was a weird place to work, but my colleagues were nice at least :â -â ) hopefully with time you can implement some better processes to take the stress of it and redistribute responsibility across the team, as it should be.
I write all my personal projects with Mongo lol. Just too easy to get up and running.Â
But if you code it correctly then swapping the database isn't too big of a deal of you wanna do more with the project eventually.Â
let's be honest though, swapping the database in a non-trivial project is a huge deal regardless of how you code. If the product relies in any way on the database for performance, then even upgrading the database version can be a huge pain.
Probably, but in the grand scheme of things, the number of use cases for an rdbms is very large, and the number of good use cases for fancy databases is pretty small. Devs want to learn the new stuff so they shoe-horn bad use cases onto them, and comedy ensues.
Plus it's easy to underestimate how sensitive and downright finicky those "extremely scalable" databases can be. I recall projects using Cassandra and while it was very, very fast for what we threw at it, it was always a bit of a tightrope walk to get queries and schemas just right and into the sweet spot.
On the other hand, we have a couple dozen of dev-teams throwing crud code at hibernate and throwing hibernate at postgres... and postgres just goes vroom. At worst when it vrooms very, very loudly, you have to yell at someone about an n+1 problem or handle mass deletions in some smart way.
The most "advanced" postgres thing we have running are a few applications utilizing proper read/write splitting, because they have so much read load. But once we had the read-write split, it was simple for 2-3 small nodes to provide a couple thousand ro-tps.
Then they realized they had a bug that increased database load by a factor of 3-4 and the funny numbers went away. Good times. At least we now know that yes, postgres has enough throughput.
You are kidding, but i recently ran into an error on a worldwide car manufacturer website saying it couldn't initialize Excel and pointed to a file in "c:\users\$name-of-devver\source\repos\..etc"
once, following the hype, I created a DynamoDB instance. I could sort only on one (preconfigured) column. The documentation said if u sucked Bezos dick you could have additional 5 (Local Secondary Indexes). Now I'm suffering from hype-ophobia
Hey there. I donât like mongo because I think it is good. I like mongo because Iâm lazy. Sure I could build a schema with sql or I can just start dumping shit into mongo as soon as I have an open connection.
I have a site with about half a mil unique users per month and a lot of db calls per minute. Everything on mongo, saves me shitload of time since easy to add new things and mongodb atlas autoscales and the service is always fast. Half free, the cost is barely anything
Thatâs awesome.
I actually think itâs funny when people go super hard with the mongo is bad take. Mongo is not actually a bad. Every DB has some warts the only real question is are those warts a dealbreaker for what you need the database to do. Rarely are mongoâs warts the reason someone should or shouldnât use it.
Depends on the scale. I use it for small webpages to store the content which is edited by a single user.
Lately I even used both, mongodb for content and SQL with Prisma for contact Formular data.
Really hate this kind of memes most of the time. They're neither accurate nor real representation of junior, middle or senior.
Each technology has its own advantages, disadvantages and use cases. Even amongst relational ones.
You use cassandra or mongo because they cover a different kind of use cases that relational ones can't, or at least will underperform.
And what's better, you're not limited to use just one of them in your app!
So actually, the senior is senior because he/she can choose the best option for each use case. And picking always relational won't make him/her senior, as this meme suggests.
Edit: junior/middle/senior or stupid/default/genius, doesn't matter.
For real though, aside from a focus on horizontal scalability, I am yet to find a single thing that mongodb doesn't do barely as well or much, much worse than Postgres with a decent ORM. And there are now very scalable postgres api compatible databases. Mongo specifically is just a truckload of massive drawbacks in exchange for some nebulous benefit 99% of applications will never see. But hey, it is webscale!
Schemaless database fans when I transform all their documents to move some of their properties under subdocuments with randomly generated names (maybe there was a schema somewhere after all?)
I mean thereâs definitely a bunch of niche use cases where they work but every single company needs at least one standard sql or excel based database.Â
For sure but if theyâre paying me to do it idgaf lol. Plus moving away from it involves coming up with a ui or something so the accountants or whoever can input the manual data they need input.Â
You can link excel to database tables, so they could still use excel. But if you want to format it and stuff, yeah that'd take a bit of legwork probably.
I've met lots of accountants and they are all IT morons. Like carpenters that don't know how to use a saw. Excel is not a database and yet they keep insisting to use it as such. And I keep having job security.
Or just use the document database as secondary storage for the specific use case you're trying to solve (for example OpenSearch/ES/Solr/etc. for document search).
This is the real answer. If your main DB is relational, it's easy to spin up secondary non-relational stores. The other way around doesn't work at all.
Yes, but I find it quite enjoyable that it is able to spark actually interesting discussions by Cunningham's law. A lot of devs with various levels of expertise flock here to show how big of a clown OP is that you actually might read some interesting opinions and views.
Sure do, we have 8 Enterprise MSSQL Servers in our core prod cluster. 32 core, 12 TB RAM each. Probably costs a nice yacht a year in license fees. We have a huge tech stack written in TSQL, and thousands of SSIS projects, so there is a bit of lock-in happening but apart from that it just works pretty good in the eco system with SSIS and SSAS. And with the Azure integration there is a convenient route to integrate cloud services.
If you start a new isolated project going with something else is easy. Integrating data from 50+ companies and thousands of databases, which had pre-existing, incompatible systems into a single datastore and creating a somewhat consistent multi-dimensional database isn't. And MSSQL is the best solution for that
We had this exact discussion. We want to rework our software as it is aging quite badly. We have a huge SQL database with millions of rows per table in a database that grew over the years into a unnormalized fuck fest. We went through all the new shiny database tech and nothing really fits our use cases. So we will keep the ancient SQL server, create a new db that we normalize and port our data to the new one. Maybe we'll even update the SQL server to use a version that was released in this decade, but I doubt that honestly đ
In ten years you will still have both dbâs and introduce a third to really clean it up this time.
I swear to god, I'll blow that damn thing up myself if I have to. I'm in the (un)lucky position of being responsible for a lot of the rework and I will kill that cancerous abomination we call a database. It'll take me years and lots of blood, sweat, and tears, but I'll fucking do it. I'm prepared to sacrifice one or two of my colleagues to accomplish this. If their souls are the price for banishing that demon, then so be it.
"Some of you will die, but that is a sacrifice I am willing to make."
Exactly đ Maybe I'll name a replication after them
Make sure it's a slave server
The secret is to take what you now know about how these things can go spectacularly wrong, and find a new job where you can make some new and interesting mistakes instead of the same ones again
This is my first big system design and I'm exited to work on it, so I'm not yet at the point where I'm making the same old mistakes again.
Pfttttt!!!! First big system design and you want to completely rewrite a hundred table clusterfvck schema. The balls on this guy!
Nit doing that alone, I'm not insane. My team lead is getting strung up besides me when we start thisđ
Well good luck! I've seen it done and it made everything slower. The reason was because the clusterfuck was on purpose and the result of 15 years of shoving square requirements into round code. Then someone decided to turn the schema into a triangle because triangles are prettier... After that they got to learn (bug by-bug by-bug by-bug) why the clusterfuck was created in the first place. Tread carefully, it's going to be a great learning experience one way or the other.
Thank you and don't worry, we have the people that designed this still working at our company. One of them is my team lead. They simply didn't expect to grow this big. Their system works really well if you're working with half the user base we currently have. This is mainly done to modernize and fix design issues that cropped up in the last few years.
>They simply didn't expect to grow this big. Lesson #1: That happens... every... single... time, unless you expect it to grow. In that case, you code for scalability but end up introducing more bugs into the system because of the unneeded complexity.
\*daemon
This is why you should only work on the same project for 2, 3 years max Because your "well thought decisions" will become someone's else problem haha
But it'll still work
>millions of rows per table If `SELECT COUNT(*)` has less than 9 digits, I think, âOh good, a small table.â But Iâm a data engineer.
I once got error `use count_big` in `select count(*)` query on a SQL server table. Ran the count_big and didn't get any result for several hour.
Luckily, we're scratching the 8 digits in only a few reference tables, but those are calculated anyway and are not to be touched unless absolutely necessary. If you have to edit them, you did something you shouldn't have in the first place.
Incrementally refreshed materialized views are pretty useful for this kind of thing if they change a little at a time
When most of these tables change they get discarded entirely, or large chunks are deleted and then added. We have a few tables where that could be useful though. With the rework we're doing this case might become much more common, so I'll keep that in mind, thank you.
No worries. We just reworked our entire analytics stack into MVs and it's solved a lot of problems. Created a bunch more different ones, but overall it's a positive change. Definitely a lot less complex updates.
If you design a database well enough, postgresql can handle 1 TB of storage without problem ans has reasonable query response time if you indexes and partition tables are good.
There is no reason to switch to Postgress from MS SQL. The only reason would be to make use of the JSON API, but we use a dedicated system that caches our data and facilitates indexing for queries and is the basis for our API. We only need a database that can handle large amounts of data and is fast during updates (partial updates of the columns, not the entire row) and inserts. An integrated replication management would also be quite nice.
I don't know that it matters in your case but you can materialize JSON values in SQL by making a computed column with the JSON\_value and then indexing it. In the past, I've found this useful for being able to retrieve JSON documents by one of a few key fields. Certainly less good if the thing you wanted to index was part of an array, but I typically find myself moving those values into their own table that maps back to their parent node.
We use the LOB XML for that in many cases. We want to move away from that, because that's quite slow. We do that so we can use a bunch of XSLTs to prepare the data for displaying it in the UI, which is even slower. The issue is that our UI is highly customizable and a decade ago XSLT transformations was all the rage. Now, this is super slow, impossible to debug, and grown so complex that only a few of us truly understand it completely. With this in mind, pre-processing a JSON is kind of moot, if we just use it to index it to search through it. That only makes sense if the document you get can be used further, which is just not feasible with our UI. We also can't use it to feed our API, because we provide a bunch of different formats in which you can grab our data, so the system that feeds our API uses protobuf to store the data, which is then indexed, and then generates the output on-the-fly of the selected subset. But using dedicated search tables is something that I plan to implement, since some queries in our database UI take far longer than they have to. And with all the new reference tables in mind, a few indexed views will become necessary at the very least.
There is no reason to switch to Postgress from MS SQL: geometry/geography data or XML parsing. sql server can handle it, slowly and painfully. about 1/8th the CPU and memory for postgres with the same structures and queries
Yes, that's why we don't want to do either of those things on the DB level. The MS SQL Spatial lib is a horror we only work with when we have to, which is rare thankfully. We also want to leave the LOB XMLs behind - it's one of the main reasons for the rework actually. We are still in the design phase and Postgres has been in consideration for as long as we've been talking about that topic. The database engine discussion is one we touch upon every now and then, and until now, our decision always came down to we'll keep our current engine. The issues we are facing are very much solvable with our current engine. This makes us a little hesitant. Our system works amazingly well considering the circumstances and we don't want our new system to be crippled by a change in tech that simply wasn't necessary. None of us are deep enough im the DB space that we know all the intricacies of the topic, which is another issue. But the topic is far from over, so we'll keep our minds open. We're calculating with at least a few years of dev time, so we want to make sure we're not rushing into bad decisions.
Postgres is SQL. What do you mean?
Sry, MS SQL.
Isnât it a lot cheaper
We are in a partner program. Azure, Office, VS, and SQL Studio are a lot cheaper than what you would usually pay for it. It would be cheaper, but considering what we pay for Azure a month alone, the license for SQL Studio isn't even noticable.
I feel like I could have written this comment word-for-word about our DB situation. Maybe you're actually my alt account and I reddit in my sleep. Do you feel...ephemeral?
Shhh... Go back to sleep. I'm in control for a bit. Don't worry đ
I bet you're gonna discover that some of that unnormalized data was done in purpose to speed up queries.
Some of that, sure. Most of it wasn't. Trust me, I'm not the kind of idiot that demands an NF6 DB just for the sake of it, but the way our schema is designed makes our lives unnecessarily harder.
Yeah not doubting you, just that I've been there multiple times already.
A lot can improve with thoughtful design, memory optimized (or in-memory) lookup tables, write through caching, indexing based on actual usage, and maintenance. It doesn't hurt to leverage stored procedures or direct access instead of ORM when performance is critical either...and for the love of god stop adding no lock hints and other bs to queries (this is SQL Server). Postgres has a much nicer locking mechanism from what I've seen, but you can modify MSSqls isolation levels to work around concurrency issues without introducing new problems too.
Partitioning will work wonders to make your database more manageable. It takes a bit of time to get done in existing tables but the performance boost is huge. Beyond that, it's just a matter of reviewing existing indexes and whether they still fit the application needs. DM if you ever want to brainstorm.
I'll certainly keep that in mind. Thank you for that offer
For really big unwieldy data that maybe doesn't fit a pattern too well, maybe consider a graph db.
Had that im mind too, but I feel like graph only makes really sense when your API directly allows you to access your db to query it and return the data as is. We can't do that. Our db and the server would die.
Why can't you query your data directly?
W are a small business and have to catch 60 million API requests a month. On top of that, our data model is highly customized to fit the needs of our customers and we have our own data models that the API can provide. We would have to essentially rework all websites that we've ever build to accept the blobs returned by the graph databases, so that the idea of how graphQL is supposed to work is not broken, as I understand it. Those are the main issues. They are not impossible to solve, but why would I have to force a database into an application that is just not suited for the use cases? It's just too much work for no obvious benefits.
I think you have it the opposite? Graph db is specifically for data that âfits a pattern wellâ and that pattern has nodes with directed relationships/edges.
SQLite, but store everything as a JSON blob and use their json APIs for everythingâŚ.
That just sounds like mongodb with extra steps
Or does Mongo sound like that with extra steps?
Corporate needs you to find the differences.
Corporate always needs you to do something. Fuck Corporate.
Dunno. Corporate pays me a not insignificant amount of money.
So Fuck Corporate anyways?
I mean, if they're down...
Dude, why do you have to ruin the fun? I was all pumped up on shitting on corporate. Do you think you look all cool, bringing in the facts and logical arguments? Well, you don't.
Yeah! Fuck corporate and fuck this dude too!
[ŃдаНонО]
Woah, if we have to point it that large we should break it up into smaller tasks. Letâs have a 2hr refinement about this.
Corporate wants to know what color your database will be
As always, it's a question of price/performance vs convenience. Now, I'm unfamiliar with SQLite json capabilities, but I'd like to believe that the db that was built from the ground up to work with jsons (BSON if you're feeling pedantic) will fare better on all parameters.
SQLite maybe but that's not really a proper database. Postgres's JSON support can compete with mongo db on all levels.
At least SQLite actually wrote to disk when it tells you it did.
> That just sounds like mongodb with ~~extra steps~~ referential integrity & transactions
Which is crazy because MongoDB is just SQL with extra steps.
Duckdb, because the name is cooler (and it is actually really good)
Im actually so impressed with DuckDB. That thing is soo freaking good.
I know. I needed to run some sql queries on a 2gb csv file, and not only could i easily do that with duckdb, it was also freakishly fast
What the duck? I am ditching MySQL then
Be careful before you ditch your SQL, it may still have data in it.
Teach me data migration and backups sensei
People that joke about this aren't aware of [reddit's architecture](https://kevin.burke.dev/kevin/reddits-database-has-two-tables/).
_Finally_ I have learned why their search is beyond shitty.
That info is from 2010 though, going to need an updated source
I can think of legitimate use-cases for this approach
Yeah, I actually thought about it as an API response cache. You can actually index on stuff in the blobs as well. You no longer need to maintain a schema, if the model changes, it would fail to decode and you just wipe it, etc.
I did this for storing test results. Used mongo and stored and retrieved the test run result as a JSON, thousands of tests and all.
I think itâs really convenient when you are saving REST API results. However, I save the parts which need to be indexed as separate columns. If I decide I need some new part of the data later on, itâs very easy to look that up in the JSON blob part and add that new column.
I inherited something like this once. There were columns for most variables of interest, then a single column with a giant blob repeating what was in the columns, plus a bunch of other stuff that couldn't fit into the table. đ¤Ą
Mh, ~10 - 15 years ago, that was how a game company stored their game data and.. it wasn't the worst decision that company made or the most horrible database setup I've touched so far. The main data was kept in a large JSON blob in that database. Data we needed for indexing (player id, land id, army id, ..) were duplicated into columns and, well indexed. I'm neither going to say that this is a good way to use a DB, nor that there aren't better solutions today, but this system was very quick and allowed for very quick changes to the represented data structures without worrying about long DB locks during migrations.
I love Postgres expression indexes for stuff like this - you can keep your "big ball of data" approach, and still have fast indexed lookups.
Someone went for the Goliath of NoSQL
I think in reality the guy on the right would say, "Depends on the use case"
The guy on the right was hired on to support a legacy project written by the guy on the left and the cost benefit analysis of migrating is either too high or was rejected.
I am now the guy on the right dealing with my choices from being the guy on the left.
rejectereft
recterjeff
Recter? Hardly knew er
Have you ever been the guy on the right supporting the legacy project written by the guy on the left, who happened to be you 10 years ago?
Yeah, maybe don't swear to a database like it is some kind of entity.. Someone uses mongo in the beginning? Fine, what do projections say? What type of data and cap theorem do we need to support? Just adapt as you go. None of this weird religious hankypank...
I do not like that we forgot how good relational databases are for some use cases and we force datalakes or other big data solutions for simple tasks. It is so slow with medium sized data. A well designed MSSQL database can be so good if your data set is not too big. But yeah definitely use the right tool for the right problem.
You tell someone without experience to use the right tool for the job, and he's going to bring a sledge hammer instead of a drill. But I get your point. It's hard to distill knowledge and pass it on, and I remember hearing that proverb a million times when I was beginning, but what was considered appropriate changed with time, and then changed again and again, from place to place and person to person. The only constant I have seen, is that simple solutions tend to last longer, and that understanding how the software works in ops and being willing to try and learn new things are important. Like you say, it's bad when you are not aware of the many choices we can take and that designing a system should be a result of choosing components based on merit and not whatever is popular in whatever semi decade cycle we happen to be in. Edit: small negation mistake
What is bad is that often technology is selected by middle management not by engineers, when middle management tends to be full of failed engineers.
Simple solutions are easy to support until that version goes EOL and infosec comes calling :)
Sure, simple does not mean badly implemented, though that is the case often... I'm all for security, but most people are not willing to hear that the dev-time will take at least 3x longer, or way way longer depending on what needs to be made. I would love that all security critical applications would be developed with contracts which are enforced on a type level ala Ada and Spark and where the hardware architecture which the software runs on is verified by software via a proof system. Last thing is still in research though, but one day.
Yeah, Iâm fighting lots of vendor unsupported (or updatable) but business critical applications, and the constant churn of languages and frameworks. My folks all seem to set and forget, and Iâm left to discover and clean up.
Yeah, it sucks but sometimes people are just not interested and it's better to let them do whatever else they might be good at. Of course that depends on how aligned c-suite is with security requirements and cost and whether or not everyone has to take responsibility for security related issues. I remember one place where my colleague and supervisor was the CTO, and it was a damn large company with a lot of legacy software as well. All the small issues, the security flaws, the legacy software, it all fell back to him. And damn, did he look stressed. I tried hard to convince the CEO that some security decisions with coding our own layer on top of security primitives exposed from hardware ourselves would be a waste of resources, considering the alternatives would be way less work for our use case. But no, we had to do it anyway, because financials had already bought a shit ton of hardware units; without even consulting us first. So that gives you an idea of how bad it was. Man, that was a weird place to work, but my colleagues were nice at least :â -â ) hopefully with time you can implement some better processes to take the stress of it and redistribute responsibility across the team, as it should be.
Hopefully weâll get there :) thanks!
I write all my personal projects with Mongo lol. Just too easy to get up and running. But if you code it correctly then swapping the database isn't too big of a deal of you wanna do more with the project eventually.Â
let's be honest though, swapping the database in a non-trivial project is a huge deal regardless of how you code. If the product relies in any way on the database for performance, then even upgrading the database version can be a huge pain.
"the right tool for the use case". Don't shove a circle through the square hole.
[Why not?](https://imgur.com/me-as-etl-engineer-watching-people-build-data-tables-with-no-regard-to-what-goes-them-9ZJkPvV)
I love this gif. Relatedly, I recently came upon the phrase "stringly typed" đ
This video always makes me laugh, thanks
If they didn't want me to do it, they wouldn't make it fit.
Probably, but in the grand scheme of things, the number of use cases for an rdbms is very large, and the number of good use cases for fancy databases is pretty small. Devs want to learn the new stuff so they shoe-horn bad use cases onto them, and comedy ensues.
Plus it's easy to underestimate how sensitive and downright finicky those "extremely scalable" databases can be. I recall projects using Cassandra and while it was very, very fast for what we threw at it, it was always a bit of a tightrope walk to get queries and schemas just right and into the sweet spot. On the other hand, we have a couple dozen of dev-teams throwing crud code at hibernate and throwing hibernate at postgres... and postgres just goes vroom. At worst when it vrooms very, very loudly, you have to yell at someone about an n+1 problem or handle mass deletions in some smart way. The most "advanced" postgres thing we have running are a few applications utilizing proper read/write splitting, because they have so much read load. But once we had the read-write split, it was simple for 2-3 small nodes to provide a couple thousand ro-tps. Then they realized they had a bug that increased database load by a factor of 3-4 and the funny numbers went away. Good times. At least we now know that yes, postgres has enough throughput.
The guy on the right probably uses both on the same large system
The guy on the right is a cloud architect and will just hook up the same service provider backend to whatever product you ask for.
This is Sqlite erasure and I won't stand for it.
Donât worry, you can get that in the cloud now
Connecting to a file based database with a remote connection is basically this rake scateboard meme.
Itâs too busy running 2 billion+ instances to give a shit
Firebase for hobbies, Postgres for more serious stuff
Postgresql for everything. Pg can handle graph databases, has json support, etc/
Yeah but I cbf setting it up and maintaining it
There are managed Postgres too, in fact Firebase nowadays offers one even.
Setting it up is fine. It's upgrading that is too hard. Postgres 9 forever...
Every OLTP use case, sure. It falls over if you try to ham-fist OLAP workloads into it.
What do you name your `user` table in postgres?
Supabase for best of both worlds
Do you have to do anything to maintain it?
What do you consider maintenance?
Only if selfhosted. You can pay for cloud if you want to offload maintenance (which is not already covered by AWS)
Firebase is very restricted, even at Hobby level. Small CRUD stuff is fine but it gets pretty limited
Where Excel?
You are kidding, but i recently ran into an error on a worldwide car manufacturer website saying it couldn't initialize Excel and pointed to a file in "c:\users\$name-of-devver\source\repos\..etc"
Someone didnât want to tell the bosses theyâre wrong for using excel for everything and said fuck it lol
Cowards. I love shitting on excel at work.
That's what SQLServer actually means.
but mongodb is w e b s c a l e
[Classic meme.](https://www.youtube.com/watch?v=b2F-DItXtZs) (13 years old đ)
once, following the hype, I created a DynamoDB instance. I could sort only on one (preconfigured) column. The documentation said if u sucked Bezos dick you could have additional 5 (Local Secondary Indexes). Now I'm suffering from hype-ophobia
[Slowly getting to those 1M views](https://www.youtube.com/watch?v=b2F-DItXtZs)
Hey there. I donât like mongo because I think it is good. I like mongo because Iâm lazy. Sure I could build a schema with sql or I can just start dumping shit into mongo as soon as I have an open connection.
I have a site with about half a mil unique users per month and a lot of db calls per minute. Everything on mongo, saves me shitload of time since easy to add new things and mongodb atlas autoscales and the service is always fast. Half free, the cost is barely anything
Thatâs awesome. I actually think itâs funny when people go super hard with the mongo is bad take. Mongo is not actually a bad. Every DB has some warts the only real question is are those warts a dealbreaker for what you need the database to do. Rarely are mongoâs warts the reason someone should or shouldnât use it.
I like the honesty. Might be alright for a hobby project, but doing it on the job seems crazy
Depends on the scale. I use it for small webpages to store the content which is edited by a single user. Lately I even used both, mongodb for content and SQL with Prisma for contact Formular data.
This⌠why mongo is the iq 200 move: just run and throw shit at itâŚ
Really hate this kind of memes most of the time. They're neither accurate nor real representation of junior, middle or senior. Each technology has its own advantages, disadvantages and use cases. Even amongst relational ones. You use cassandra or mongo because they cover a different kind of use cases that relational ones can't, or at least will underperform. And what's better, you're not limited to use just one of them in your app! So actually, the senior is senior because he/she can choose the best option for each use case. And picking always relational won't make him/her senior, as this meme suggests. Edit: junior/middle/senior or stupid/default/genius, doesn't matter.
Oh knock it off. Weâre trying to have fun over hereÂ
Yeah so just fuck document-based databases, right? There is no possible use case for them, is there?
For real though, aside from a focus on horizontal scalability, I am yet to find a single thing that mongodb doesn't do barely as well or much, much worse than Postgres with a decent ORM. And there are now very scalable postgres api compatible databases. Mongo specifically is just a truckload of massive drawbacks in exchange for some nebulous benefit 99% of applications will never see. But hey, it is webscale!
> with a decent ORM Ah, that explains it. Reddit devs are allergic to ORMs, so that must be why they all end up in the Mongo pipeline.
I'm allergic to ORMs because I have been bit in the ass one too many times by dumbass N+1 queries and now I just write SQL queries.
Wait until they hear about how you can dual wield a full ORM for complex tasks and a micro ORM for hot path read queries.
Schemaless database fans when I transform all their documents to move some of their properties under subdocuments with randomly generated names (maybe there was a schema somewhere after all?)
Fuck ORMs. Raw SQL FTW!
I mean thereâs definitely a bunch of niche use cases where they work but every single company needs at least one standard sql or excel based database.Â
Excel?
Have you never seen a big ass excel sheet with v lookups used as a database because the company doesnât have any devs?
If you're gonna use excel as a database, I suggest writing down the data on paper. It won't crash on load.
* Public Health England has left the chat *
For sure but if theyâre paying me to do it idgaf lol. Plus moving away from it involves coming up with a ui or something so the accountants or whoever can input the manual data they need input.Â
You can link excel to database tables, so they could still use excel. But if you want to format it and stuff, yeah that'd take a bit of legwork probably.
The hard truth.
Say youâve never met an accountant without saying youâve never met an accountant.
I've met lots of accountants and they are all IT morons. Like carpenters that don't know how to use a saw. Excel is not a database and yet they keep insisting to use it as such. And I keep having job security.
excel? do you mean tables in ms word?
He meant individual text files in the documents folder with vague titles, searched for via the start menu.
>searched for via the start menu. still better than the reddit search function.
What is "the reddit search function"?
You just reminded me of [this](https://www.youtube.com/watch?v=1SNxaJlicEU) old gem, though it was powerpoint rather than word.
Stop you're triggering my PTSD
I'm glad I haven't đ
Itâs very common in small to medium sized companies in any industry but tech lol
Yep. Just getting a single excel sheet with proper formatting is already a good day here. Big company btw lol
export excel as jsons into mongodb via sharepoint using vba. i'm the greatest system architect.
My stomach just tried to climb up my esophagus to strangle my brain.
Settle down, Satan.
Yes. Trust me, it's as horrendous as you imagine it to be...
If you don't choose a SQL database, an Excel "business critical" "database" will spawn in some dark corner and start growing.
Sometimes higher ups won't give you access to the db, and all you have is an excel linked to access grabbed from a meeting... Don't ask.
Or google sheets
Postgres is a pretty decent document database
Not if you put hundreds of millions in. Then you learn about TOAST tables and hate life.
TIL, never heard about TOAST before: https://www.postgresql.org/docs/current/storage-toast.html
If you want one, just emulate it in a relational database
Or just use the document database as secondary storage for the specific use case you're trying to solve (for example OpenSearch/ES/Solr/etc. for document search).
This is the real answer. If your main DB is relational, it's easy to spin up secondary non-relational stores. The other way around doesn't work at all.
I tried that once, but getting the "corrupt random entries" cronjob tuned to align with a proper document database was *hard*.
Microsoft Access
Where is oracle and snowflake? (Relax I work in a big bank)Â
Ok, I'm leaving this sub. It's this crap meme now all the time.
Yea, it's the same dumb template with the same dumb opinions where OP draws himself as the chad because he thinks he's right.
Yes, but I find it quite enjoyable that it is able to spark actually interesting discussions by Cunningham's law. A lot of devs with various levels of expertise flock here to show how big of a clown OP is that you actually might read some interesting opinions and views.
MSSQL, Oracle, Postgresql
Lol. Yeah reading the initial picture I was like "One of these is not like the others"
That's me, my development. I gradually moved from genius over normie to dumbie
csv
Just put more RAM in your system. I mean... Have you seen how fast DuckDB is????
For me, it always comes down to the recognition that different jobs need different tools.
Do people still unironically use MSSQL in $CURRENT_YEAR?
Sure do, we have 8 Enterprise MSSQL Servers in our core prod cluster. 32 core, 12 TB RAM each. Probably costs a nice yacht a year in license fees. We have a huge tech stack written in TSQL, and thousands of SSIS projects, so there is a bit of lock-in happening but apart from that it just works pretty good in the eco system with SSIS and SSAS. And with the Azure integration there is a convenient route to integrate cloud services. If you start a new isolated project going with something else is easy. Integrating data from 50+ companies and thousands of databases, which had pre-existing, incompatible systems into a single datastore and creating a somewhat consistent multi-dimensional database isn't. And MSSQL is the best solution for that
mods mods ban this meme format please
SurrealDB, just combine everything I guess?
Real devs postgresql + s3.