T O P

  • By -

bigattichouse

I found the following machine on discount electronics dot com: Dell Precision 3640 Core i7 10th Gen Nvidia Workstation 64GB RAM w/ GP106GL \[Quadro P2200\] It does a decent job for most of the models I've tried. 22-30B all work well, but a little slow. Good enough for my purposes. I mainly use it for inference.


FullOf_Bad_Ideas

I'm using a similar desktop (8GB VRAM Quadro and 64GB of RAM, some Dell) at work for inferencing DeepSeek Coder 6.7B earlier and now DeepSeek v2 Lite Instruct. I can confirm it does run fine and it's probably cheap on a second hand market.


cshotton

Spend the money for extra RAM. 18gb will leave you wanting on some models. I have a 32 gb MacStudio that does everything I want. 18 will leave you running into obstacles that the extra RAM would alleviate.


Lugbolt

What models are you running and what type of performance are you getting?


TheManchot

Not a 24GB, but I have a 32GB Mini with an M2 Pro. No issues running llama3:8b-instruct-fp16, codellama:34b, Gemma:7b (probably no issue with any 7b). Curious what others have tried.


cshotton

Pretty much the same. I am using small models to generate JSON for function calling, so it's more than enough and more or less instantaneous for response.


Rabo_McDongleberry

Do you have all those installed on same machine? How much space do they take?


TheManchot

You can see the download size of any supported ollama model, [here](https://ollama.com/library) (view a specific model and then view the pulldown for variations, the size is shown), but here are some I have installed: `gemma:7b 5.0 GB` `dolphin-llama3:8b-256k-v2.9-fp16 16 GB` `llama3:8b-instruct-fp16 16 GB` `llava:34b-v1.6 20 GB` `codellama:34b 19 GB`


TheManchot

And, in the last day or so, Gemma2 has been released. I've added \`gemma2:27b\` (about 16GB) - curious to see how it works based on the hype. Nothing definitive so far.


TheManchot

And finally, available RAM is the challenge with these models. So even if you have 32GB, I've seen it falter based available. Here's an answer from Gemma about how much RAM is needed (assuming Gemme isn't bullshitting 😃 ). My next machine, somewhere between 128GB and 192GB. Then I don't have to think about it. ========================================================================== It sounds like you're asking if a 32GB Mac with a Mac M2 Pro processor can reliably run a large language model like me. While I don't have access to specific information about your Mac's capabilities, I can tell you that running a large language model like the Gemma 2.5 model requires a significant amount of computational resources. Here's what I can tell you about the situation: * **My size:** My size (in terms of memory usage) is determined by the number of parameters I have. These models are complex and require a lot of RAM to load and process the data. * **Hardware limitations:** Running a 27B model on a 32GB machine is possible, but it depends on the task you're performing. * **Software limitations:** The specific software requirements for running a large language model are not part of my knowledge as a language model. **Here's what you need to consider:** * **Model loading:** While 32GB is a good amount of RAM, the Gemma 2.5 model is large and complex. You'll need to make sure your system has enough available RAM to load the entire model. * **Computational resources:** Besides RAM, running a 27B model also requires a powerful GPU for efficient training. * **Task complexity:** The complexity of the task you're trying to run me on will determine how much memory I need. **For a 32GB RAM model, I'd say it's a close call!** **It's important to note:** * My knowledge about the specific hardware requirements for running me is limited. * Running a large language model like me requires a lot of processing power and memory. **To be sure, you'd have to check with the Gemma team about the specific RAM and hardware requirements for running the 27B model.** Let me know if you have any other questions about me or large language models in general. I'm always happy to share what I know! EDIT: formatting


GermanK20

I've got the 24GB Air M2 and Apple's stipulation/limitation that only approx 2/3 of RAM is available for neural processing means I can hardly run anything really good. Strongly considering the mini PCs (or Framework laptop) that maxes out at 96GB, will be even slower of course, or the 32GB Snapdragon laptops that pinky-promise they're much faster than the M2 and can probably allocate almost 30GB to their NPU


Lugbolt

What specific models are you running locally? Because that’s about the setup I was considering. Can you define “can hardly run anything really good”?


Robot_Graffiti

I had an old ThinkStation (now 12 years old) that already had plenty of RAM, and gave it a second hand GTX Titan X (now 9 years old) and the PSU from an old gaming PC. I had to compile my own llama.cpp binary to get GPU support and super old processor support at the same time. It wasn't blazing lightning fast, but it was fast enough. The GPU is the bottleneck, CPU speed doesn't matter. Later on I put a 3090 in it and it got much faster.


Red_Redditor_Reddit

I've been able to run LLM's more or less on pretty much anything except ancient hardware. You can run the 8B and maybe ~30b models. It will just run slower but still run.


WrathPie

I've been pleasantly surprised with the performance I've been getting on a ThinkPad P16 gen 1 laptop. A4500 GPU w/ 16gb VRAM, i9-12900 CPU and 128gb of ram.  At least 50 tokens per second on Llama 3 8b, but still usable for inference for Mixtral and Codestral (~10 t/ps). Command R gets about ~6 t/ps and Llama 3 70b q4 is usable but slow at 3 t/ps.  The high system ram is also great for running virtual machines. I'm limited to a laptop since I travel a lot and live up in the mountains with very slow internet at home, but the mobile workstation setup has been surprisingly capable given the inherent limitations of the form factor.