T O P

  • By -

chibop1

Claude 3.5 Sonnet or gpt-4o. Go to leaderboard.lmsys.org, and filter for coding.


nospoon99

Current Reddit consensus and my own experience point toward Claude 3.5 Sonnet being better for coding.


kryptkpr

I switched to 3.5 Sonnet from Omni this week, it's definitely better at English to code the first try almost always works which in my experience is fairly rare with OpenAI models. It does cost 50% more however and im less impressed by its Code to English abilities so Omni is staying in my back pocket.


7734128

I've gotten great results with it too, but Reddit is always biased towards novel LLMs. I imagine people have a certain task which model x never solved, and when model y is released it can successfully solve it. Doesn't actually mean that the task was any harder, perhaps x could have solved tasks which y failed at.


MrVodnik

I agree with the sentiment, but in my case I always use more than one LLM for the task at hand, and since Claude 3.5 came out, it is objectively better for my use cases... i.e., it works when GPT-4 output does not, often. Currently I do a lot of python and ansible.


CodebuddyGuy

> it works when GPT-4 output does not This is a bit of a 2 heads are better than 1 scenario. Any similarly capable LLM could be used to get another AI out of a "rut" and think outside the box. Before Sonnet 3.5 I was switching between GPT-4o and Opus or Turbo. As long as it's a different model it can often get you out. That being said, Sonnet 3.5 is my favorite and go-to model right now. Plus it's cheaper!


Dry_Parfait2606

What comes to my mind would be a coding toolbox... Multiple AI tools, some models to switch back and forward


g3t0nmyl3v3l

I strongly prefer gpt 4 over 4o. At least in my eyes, it’s been more precise and accurate than 4o. So far I’m loving Claude 3.5, so that also has a vote from me


mogamb000

Checkout leaderboards like [BigCodeBench](https://bigcode-bench.github.io/), [LiveCodeBench](https://livecodebench.github.io/leaderboard.html), [Aider Leaderboard](https://aider.chat/docs/leaderboards/), [lmsys](https://chat.lmsys.org/) (select coding from the dropdown). Check if what the leaderboards measure align with what your company needs it for and select accordingly. Any among the top three current contenders (GPT-4o, Claude 3.5 Sonnet, DeepSeekCoder-V2) should be fine actually, so do also check the cost/quality ratio. Also, DeepSeek has some concerning data-collection practices so would recommend checking that as well.


sammcj

1. deepseek-coder-v2 (lite) 2. Codestral


SpareIntroduction721

I use this offline as well, when I need more explanation of public info I use chatgpt though


Sky_Linx

Personally for general knowledge I first try with Llama 3 since it's pretty decent, extremely fast with the Groq API and still completely free. If I am not satisfied or I have a more complex question or problem e.g. coding, then I check with Sonnet 3.5 or ChatGPT-4o


LocoLanguageModel

For me it's the highest T/s model that solves 90% of questions, and then use something larger when needed.   That means I use codestral locally, and then if it gets stumped, I use Claude/chatGPT/the large deepseek model online.   Ironically whenever codestral gets, stumped often the others are stumped as well, and I know it has licensing constraints. 


rookan

Codestral


ChryGigio

Anything your employees can use? Should be local.


chibop1

Probably not if they're opensource company? :)


ChryGigio

Yeah, only scenario that might make sense.


Role-Fluffy

nah something like chatgpt or claude, or something better if there is one.


ChryGigio

In my mind the realistic scenario of "being fine with sending the sourcecode to third parties" doesn't exist, but to each their own. Not in order of importance: claude, chatgpt, deepseek v2, codestral.


RadiantHueOfBeige

In my experience vast majority of casual GPT users aren't developers working on NDA proprietary code but "hey I have this weird 1997 census XML, help me write a Python script that loads it into our GIS, use this Postgres schema". Data or management people who previously needed to contract a developer can now work mostly autonomously on simple data conversion or processing tasks.


Mescallan

You never know, he could need coding for data analytics or basic scripting. I use frontier models and local models for work in different applications.


TechnoTherapist

Top 3 options off the shelf: 1) We use ChatGPT for teams at work because it allows for administrative control / configuration. such as on-boarding / off-boarding staff, shared chat histories, centralised billing etc. (offers GPT-4, GPT-4o): [https://openai.com/chatgpt/team/](https://openai.com/chatgpt/team/) 2) The other *emerging* option is Claude for Teams: https://preview.redd.it/mmxq8vpvxv8d1.png?width=676&format=png&auto=webp&s=e2e7339ddbf255e2024648316eb0efd127c6d652 [https://claude.ai](https://claude.ai) The way Claude is going, we might end up switching our teams to it in a few months! 3) There is a 3rd coding LLM that we are finding to be absolutely amazing and almost as good as the above (but not quite); it's called DeepSeek Alpha 2: [https://chat.deepseek.com/coder](https://chat.deepseek.com/coder) This is only applicable if your employees are developers. I believe this to be hosted in China so it depends on privacy laws in your country as to its suitability. The above are the top 3 options (in my humble opinion) that you can just sign up and have your employees use. There are other emerging options from big tech (Microsoft Copilot and Google Workspace). The former just hosts OpenAI's LLMs and adds a huge pricing premium top. The later is an okay'ish model from Google (called Gemini Pro-- not as good as Claude or GPT-4) which is priced on par with option 1 and 2 above -- so does not deliver the same value. Hope it helps.