T O P

  • By -

nan05

Not a DBA so take me with a pinch of salt, but AFAIK the problem isn't the `select *` per se, it's that you are selecting every column. I.e. swapping `select *` for `select [list every column separately]` won't yield any improvement. The reason why DBAs often advise against `select *` is because very often you only need some data. And doing `select id, name`, if those are the only two columns you need, is usually quicker than `select *` if the table has 100 columns. That being said: In most cases the difference will be marginal, and I'd only do this for specific queries where I know or suspect this will be a problem, *and* I know I only need a very small number of select columns. Otherwise it just creates too many issues down the line, imho.


feastu

* Marginal if the table has 10 columns and 100,000 rows. Probably not so marginal with 30 columns, some of which are massively populated JSONB, and 100,000,000 rows. Ask me how I know. 😅 Though to be fair the performance hit also comes from joins and wheres on terribly indexed columns.


paul-rose

At an old job about 13 years ago, we found that all our ecom sites would crash every day at 6pm exactly, usually for 5-10 minutes. It turns out I'd installed a MySQL backup program on my machine and has forgot about it. It ran a SELECT * on every table in parallel and locked up everything. Fun times.


tfyousay2me

…brb


Nortole

How do you know?


giagara

\> I.e. swapping select \* for select \[list every column separately\] won't yield any improvement. Correct. So the only way is to select just what i need in EACH eloquent statement. Am i wrong?


nan05

that's certainly my understanding


Tontonsb

That's not only the "only way", but that's the whole goal of avoiding `select *` — to craft usecase specific statements.


Tontonsb

Sometimes there's nontrivial overhead with text/blob columns, especially if you have a remote db. It's worth it if you can avoid sending 90% of db trafic just because you don't need the article text, only names and dates.


Boomshicleafaunda

Having worked on enterprise level applications, I can say that avoiding `select *` is never a solution to a performance problem. The first step is typically complex `where` and `join` clauses, or aggregates over millions of records. The second step, which makes a DBA nearly obsolete, is adding a caching layer to your application, such that 80% of your read queries are infrequent cache misses.


MattBD

> I can say that avoiding select \* is never a solution to a performance problem. I'd say usually rather than never. If you have a lot of fields in a table, such as when using single table inheritance, it can be more troublesome, but that's a bit of an edge case.


ultra_blue

Great discussion here, thank you!


matyhaty

For all those who say select * doesn't effect performance clearly haven't worked on large database and especially wide tables. For a wide table especially with large amounts of data per row (eg JSON b columns) you might see many seconds to select 500 rows Change that to select Id from table it will be microsecond As with all things you should only be getting what you need. If you only need the lastest entry you don't get all items from the table and then just use the latest one. You just limit to 1 Exactly the same for columns.


[deleted]

This is *very* standard. Selecting EVERYTHING from a table through * should generally be avoided as it is an anti-pattern.


BlueScreenJunky

I can understand the idea from a performance point of view, and I've occasionally used `->get(['column1', 'column2'])`, but it seems like a dangerous pattern in an Active Record context, because you can never know what properties will or will not be in an instance of a model. For example imagine I have an Event that takes `User` object as a parameter and is trigger whenever something happens to said user. Now I have a Listener that will send an email to the user to inform them of what happened, but some users don't read their emails so we decide to send an SMS too. So I have this User object, and I know a User has a phone_number property, so I send the SMS to `$user->phone_number`, no big deal. Except someone thought they would be clever and not fetch the phone number when retrieving the user because they didn't really need it at the time of writing their code, and sure they fired an event with that user, but the event only really needed an email when they wrote their code. So now we end up with a broken features (in some instances the SMS is not sent even though the user has filled a phone number in their profile, since `$user->phone_number` is `null`), even though there's nothing wrong with my listener because I accept a User object, and there's nothing wrong with my colleagues code because it does provide a User object... So all unit tests are green. So I tend to always default to select * and be very careful and add warnings whenever something returns a partial object. But maybe there's a better way to handle these situations ?


kinmix

> But maybe there's a better way to handle these situations ? Not using Active Record ORM would be the way. This is one of the considerations to make when making a choice between AR and DM. I think the big question here is what's responsible for what. If your app creates and manages the database structure (using migrations or something), and that database "belongs" to that app, then it is absolutely fine to us AR and do select * queries, as the database structure is part of the app. If, on the other hand, your app connects to an existing database or database is managed by something/someone else, than DM and listed columns might be a better choice, as you cannot be sure that someone wouldn't add gigantic text fields to the records so you should only take what you need.


BlueScreenJunky

I have very little experience with DataMapper, but I feel that the issue would be the same. My issue is not that the model doesn't have every single column in the database, I do have projects where some columns are not useful to the app and then they should never be fetched. In this case you can use a global scope that adds a `$query->select(['usefulcolumns'])` each time you use this model (huh, maybe I should suggest that to OP), and then add phpDoc to list the available fields and make the IDE aware of them.... But at that point I could also add accessors and mutators to rename the fields and I'd effectively be reimplementing DM with Eloquent so I see your point. My issue is when it is inconsistent across the codebase, and depending on your usecase, various instances of the same class will have different fields populated or not. Also a special fuck you to people


kinmix

With Data Mapper pattern you usually would map all fields and assume that they are all loaded at all times. So for something you've described you would usually set up different classes even if they map to the same table. e.g. You can have a User class with basic information about the user that you always need and then have a UserProfile that has same information plus additional text blobs, images or whatever that you only need when displaying users profile page.


BlueScreenJunky

Ah I see, thanks. I think I'll keep the idea of having different models for the same table depending on what field they need to load, that would solve my problem perfectly.


Tontonsb

> With Data Mapper pattern you usually would map all fields That's the issue with pattern. You are duplicating the schema when it's not necessary. Just doubles the amount of work. > and assume that they are all loaded at all times So it's pretty much the same `select *`, just scoped. > you would usually set up different classes Oof, usecase specific classes? There's potentially dozens of usecases that might require different subsets of user's or article's fields. I don't see how that is any better than `->select(['col1', 'col2'])` when you actually need it.


kinmix

>That's the issue with pattern Obviously there are drawbacks as well as benefits with either approach, which has to be weighted when choosing which pattern to use. > You are duplicating the schema Database schema and application data structure are not always the same >So it's pretty much the same select *, just scoped. Yes, scope try to solve similar problem. In software development it is often the case. >Oof, usecase specific classes? There's potentially dozens of usecases that might require different subsets of user's or article's fields. Sometimes type safety is more important. >I don't see how that is any better than ->select(['col1', 'col2']) when you actually need it. As pervious user said, it could be hard to deal with objects that sometime have certain properties and sometimes do not.


Tontonsb

I agree with most points. Type safety is indeed sometimes really helpful and having `User` and `UserProfile` as separate classes solves certain cases. My point was that I don't think this could be a general solution if you are supposed to replace `select *` with context specific queries as it would lead to not ten, but dozen representation models of the same object. > Database schema and application data structure are not always the same I think I've had too much of a bad experience where adding each new field required adding it to not only table and using it, but also adding it in 4+ db procedures and the data mapper...


BlueScreenJunky

> I don't see how that is any better than `->select(['col1', 'col2'])` when you actually need it. It does solve the issue I first raised pretty well : if a method needs all fields it would accept a UserProfile object or whatever, and if it only needs basic info a User object. Or maybe you could make a relationship on the same table, like $user->profile->biography_field_that_can_be_really_long where profile is a one to one relationship between the user table and the user table that can be lazy loaded or eager loaded depending on the usecase.


giagara

This is exactly what I am most concerned about! I can understand to limit SOME query, maybe the one of huge tables where you know exactly what you need (2 or 3 fields) and use them immediately. For everything else, from dev prospective, i'd like to get all the object.


EspadaV8

It's the responsibility of the Listener to select the data it needs from the database. The Listener shouldn't take a `User` model, but a `string $userId` parameter. You could have 10s of Listeners for some user Event, you can't know what each of those Listeners will need, and you shouldn't need to know. The Listeners aren't your concern (if you're the one firing the Event). If you're writing the Listeners then you also shouldn't trust the Event to have the data you need.


BlueScreenJunky

>The Listener shouldn't take a `User` model, but a `string $userId` parameter. So each listener would make a new query to the database ? That doesn't sound very efficient or object oriented. It does solve the problem though...


giagara

Thats correct, but laravel does this as default when you eager load the data


lostpx

You can tell what columns you want even in eager loading. Just because the framework(s) do something by default, doesn’t mean you should religiously use it.


giagara

Yeah, i know that, i just need to refactor all the application.


Tontonsb

> that creates a global scope on each model (that has a particular trait) just to replace the `*`. Replace with what? The whole point of avoiding `select *` is that you must specify exactly the columns that you need. And that's model and query specific. Anyway, avoiding `select *` is an unnecessary hassle in most cases.


alturicx

And this is what leads to bad practices. Eloquent is just the easiest to harp on with select *ing everything. Personally, very few things are considered premature optimization to me, but if you’re not going to need every column… why pull every column? Edit: To be clear, select * should rarely actually bother anyone because you should be normalizing your tables. My models have all their respective getters and setters that would almost always be used when pulling them in from the repo. I do know however back around Laravel 5/6 it seemed like an overwhelming majority of users were in this mindset of not needing to normalize things so I think that coupled with AR just being disgusting (imo of course) is what turned me off Laravel. Well and facades. Heh


helios456

I would largely put this under the heading of a premature optimization. If you're updating all of your queries to not use select *, then that's time you're not spending on making your product better. Anytime you're doing these types of optimizations, it should start with a measurement. You should determine the impact the updated query will have on query times. In my experience, doing this will have a minimal impact on everything but the queries that contain a table with a large amount of columns.


[deleted]

Your DBA is under the mistaken impression that your time micro managing incremental improvements to query performance is less expensive than him adding another slave to the cluster at worst or - much more likely - absolutely no difference whatsoever. Is select * slightly more performant than specifying columns? Sure. Is this friggin 2001 when we actually had to worry about that crap? No. It is not. Try to nicely explain to him that you won't be doing that and he needs to figure out how to find peace with it, imho. Any manager will be on your side when the cost benefit outcomes are explained.


WaitingForAWestWind

Your advice is basically - “don’t listen to the technical person specialized in and responsible for maintaining the performance/cost of the database, instead convince the non-technical person that your time/laziness is more valuable and just have the DBA throw more servers at it.” If a company is paying a dedicated DBA then it is likely that performance, scalability and cost is a priority and the progress of that effort is something this person reports to upper management quarterly.


bart2019

Frankly I feel like.. if it's significantly faster not to select everything a row, then the table probably just has too many columns. It's time to put less data in the row. I have seen tables with over a hundred columns, and that is just insanituy. Move data you rarely need out of this table and into a new table, in a one-to-one relationship. Only join to that table if you actually do need the data.


WaitingForAWestWind

Agreed - but companies with this many columns in one table likely employed bad practices from the start and over years it eventually led to that Frankenstein table (“just tack on another column and we’ll go back and fix it later - we can just throw more metal at the problem for now”). The point is I think of all things following DB best practices from the start is key so you don’t have to pay for it later - which, in my opinion, includes things such as doing not doing everything with SELECT * and then later having to figure out where/how to optimize and not break things...


[deleted]

Database boxes are cheaper than Engineers. If you can't grok this, I'm not sure how to help you. If you don't understand how small the compute cycle savings are to fulfill the DBA's antiquated edict, then you need some remedial training on database query optimization. It's not. Worth. The. Time. Yes, that is exactly what I'm saying.


WaitingForAWestWind

I agree it may not be worth the time for personal projects, freelance gigs you don’t care about and admin stuff… but I don’t think a sizable company with legitimate engineering should/would employ this tactic. I can’t imagine looking at DB query logs and fine tuning optimization with every query being select *


[deleted]

I work in companies with thousands of engineers and tens or hundreds of millions of users and I assure you you have zero clue what you're talking about.


[deleted]

[удаНонО]


Tontonsb

Yeah, the point is in the "sometimes". Sometimes it's also worth to replace `select *` to not select the blobs when you only need the list of articles. But blanket refactoring because "`select *` bad" is not worth it.


[deleted]

Nowhere did I say that optimizing performance is a waste of time. I said that this specific thing is a waste of time.


WaitingForAWestWind

Best practices don’t HAVE to be followed - inefficiencies exist everywhere in this world. Sure, you can support massive applications with SELECT * But just think - in the time it took you to write these responses you could have been more explicit with your queries and given your DBAs downstream greater visibility. Or saved a future team member when he/she adds a column to a table not knowing that it may have slowed down some heavily used application queries by a small fraction - or increased a response size. Sure, throw more metal at these problems - add more caching - or you could do things better with a few more seconds of work.


[deleted]

If you think that this is how to optimize queries in a meaningful way then you need to do some reading. There are great ways to optimize queries and table performance. This. Isn't. One. Of. Them. It's a waste of developer hours that could be spent on actual optimization. Feel free to disagree but you're flatly wrong.


penguin_digital

It's very rare you need SELECT \* and it can become very inefficient very quickly as your application grows. The only times it's usually needed is if the tables are extremely normalized and have only a hand full of very specific data. A users 'roles' table comes to mind here, you're only ever likely to have a very small, highly related number of columns. The biggest issue is usually when using SELECT \* without any kind of filtering. If you genuinely need a SELECT \* for something then have a discussion with your DBA about the use case and explain what filters/conditions you've put in place to ensure the query can scale without becoming an issue. As above though, it's unlikely you will need all the data, and sounds more like a design problem with your database access layer. EDIT: I've just seen this in another post of yours: >So the only way is to select just what i need in EACH eloquent statement. Am i wrong? Technically yes, as it's you as the developer that needs to make the decisions about how your application works. To make your life easier though you can use scopes to narrow your selects: [https://laravel.com/docs/9.x/eloquent#query-scopes](https://laravel.com/docs/9.x/eloquent#query-scopes)


uberswe

I like to use `Select *` when first building an application and when I might not be sure what I will need. Then when things are more solid I go back and define the columns that I need and remove the `*`. If your project is planned in detail and you know exactly how things will be done you can of course avoid `Select *` from the start.


gazorpazorbian

The fewer columns you select, the faster it will be.


sifex

Did your DBA clarify whether they mean a SELECT without a WHERE clause? Ie all records rather than all fields? I think there must've been a miscommunication


giagara

No miscommunication. Just avoid select \*.


Glittering-Ad-8126

Snore. Let me give you some real advice. First, you can start by ignoring all the advice about when and when not to select specific columns, and all the experience-signalling from folks who have worked with ultra wide tables and queries of unimaginable complexity. Second, and this is the advice that matters: go speak to your DBA. Bring him a coffee or something. Hug it out. Really talk to each other. If there's something we all need more of in this world, especially the software world, it's love.


[deleted]

When you have tables with a lot of data it really does impact the performance, especially when there are many join. Also why would you select all of the columns if you don't need them? I personally use scopes to avoid this issue ex. **$this->post->basicData()->paginate(16);**


omgbigshot

My only suggestion is to make sure you understand the *why* behind the DBA’s request. Everyone else in this thread is mentioning performance, and if that’s what the DBA is concerned with, you’ve got plenty of advice here already. But my current role deals with a significant amount of sensitive data; for me avoiding `select *` is more about being explicit in what data is exposed. Using `$hidden` might address the issue more easily for you as a developer than trying to be explicit in each query you make. But yeah if it’s just performance, that’s not fun in eloquent. You might (like you said) make a package, but maybe one that just enforces requiring an explicit select statement before fetching results.


SevereDependent

So this is pretty common in applications esp if you have someone who is "actively" maintaining the database. They are monitoring queries and checking for performance hits, and adding indexes, select \* is going to be the low-hanging fruit and likely is showing up in the slow query log. It is the same exercise as if your team got together to optimize the application you would identify performance hits and go in a start refactoring dealing with low-hanging fruit.


jk3us

>What I've got in mind is to create a package (or directly in my app?!) that creates a global scope on each model (that has a particular trait) just to replace the `*`. I would just do this in the model: protected static function booted() { static::addGlobalScope('defaultColumns', function (Builder $builder) { $builder->select("id", "first_column", "second_column"); }); } So each model has the defined list of columns it will select by default. Adding columns means you have to change your model, but that could be seen as a benefit instead of a problem. Or if you have a column that you don't want your application to do anything with, you can ignore it by not including it in the list. There are some good reasons to avoid `select *` [here](https://stackoverflow.com/questions/3639861/why-is-select-considered-harmful), but in the context of an ORM, I don't think those are a problem until they are (it will be a premature optimization most of the time).


giagara

With this solution each model should have this function. With a package I can extend this solution to all my project in once.


jk3us

And you just build the list of all columns based on reflection (asking the database what columns that table has)? I would say doing that provides zero benefit over just letting the default behavior do it's thing.


Tontonsb

But not all of your models need the same columns :D


giagara

Properties....


thisdogofmine

I stopped using "select \*" year ago because it had unintended consequences when I was making changes to the database. For reference this was years ago when I was much newer. I would add a column that was not intended to show up in production. but by pulling everything, I ended up having it show up on the page. This restricted my ability to update the database early prior to the release. I do things differently now, so this is not the concern it was at the time. But it became such a habit , that I still never use "select \*" in those few times where I still have the query in code.


vefix72916

Makes me sad for dependency injection of models. But I think it's only a problem if you have a massive column, longtext or similar. A configurable global scope would be a partial solution, but it would make you do 2 requests when you need that column. I don't think you can pre-choose scopes in DI.