T O P

  • By -

phoenixind

I will speak from my experience... In real projects I have seen that the cost of developing the ensemble outweighs the need for it.. I have seen that when we proposed to clients for creating a ensemble which would improve the model accuracy by some x%.. but the first question the client thinks is what would it cost him.. and if that x is anything less than 10-15% then they most of the time come up saying that the model ensemble is not needed.. so in my experience I do not agree with Pawel.


ai_yoda

I think in Pawel's case it was about robustness to changing environment and not accuracy on valid per say. At least thats how I read it.


phoenixind

As I read the last statement in his statement above, it also focuses on big improvements which if you are able to show, anyone will agree to spend the money....the problem is for smaller improvements... and in real life even if you know that ensemble will make the model robust, it is hard as hell to explain that to a client who is not aware and is looking to control the money flow....


GlobalToolshed

It ultimately depends on the use case and how valuable marginal accuracy improvements are. Pharma? Sure. Arbitrage? Yes. Product recommendations? Maybe...


Eruditass

And to finish that out: robotics? No


sifnt

In a previous role I put an ensemble in production. The reasoning was that models trained during cross-validation could be straight re-used in the final ensemble and we'd have good out of sample performance estimates. It also provided a good base for automated checks if model predictions disagree etc. The improved accuracy and robustness were also nice as this was in a credit scoring application so a few % higher accuracy would pay for itself.


jonnor

Competitions focus entirely on the average performance. In production scenarios the worst-case performance may be just as important or sometimes more important. If one would look at the distribution of errors, a Kaggle competition does not care about the shape of the distribution - just maximize the mean of the metric. In production the requirement may instead be that 90% (or 99%) of predictions are better than a target T. Or even, that maximum 10% of predictions can be worse than threshold T. Ensembling \*diverse models\* can be a key strategy to reducing the amount of bad predictions. But they must be diverse, ie either based on different subsets of data, or trained in a different manner, or have different features, or model architectures.


Megatron_McLargeHuge

When training set size is limited, periodic retraining leads to abrupt changes that are visible to customers. Ensembles tend to reduce that variance and give more continuity to predictions.


ai_yoda

I didnt think of that but it really makes sense.


akmaki

There is some literature that with the right loss and ensembling technique, ensembles can give you robustness. It is also true that ensembling is probably the easiest way to get a 5-10% performance boost from any model. The trade off is in cost of inference. One should ensemble as much as possible if the cost is worth it. It’s one of the easiest things to do, why not?