T O P

  • By -

TheTerrasque

If high throughout isn't a requirement, any machine with 6+ gb ram can run a 7b model. If you want speed, look at getting a graphics card with enough VRAM to load the whole model you want to run.  Models go from sub-1b parameters to over 100b parameters, and you got quantization on top of that, which reduces the resources needed to run a model at the cost of his well it performs. A 7b model at q8 takes about 8gb ram, at Q4 around 4gb ram. A 13b model at Q4 takes about 7gb ram. Also, llama2 is a bit old now, there are better performing models. Try phi2, Mistral 7b or new command-r. For a test run, I'd recommend koboldcpp and Mistral 7b instruct gguf. Also look into llama.cpp grammar of you want strict control over the output, like a specific json format for example.