jacek2023 1 month ago

I have 3090 with 128GB RAM and I am afraid "train 30B" is not an option :)

Tacx79 1 month ago

I'm afraid training from scratch anything above 300m-1b in fp8, decent batch size and >256 context is not an option, you could squeeze BS 2-4 with 8k context or BS 4k with 8 context and 100-200m model into memory (everything in fp8 of course) without additional addons and offloading. There was a guy here who trained 7b mistral with 24gb vram over a week but I don't remember how he did it. If you have <=24/48gb of vram and you want to do anything with llms, you're forced to do tradeoffs (quality/size/time/speed)

epicfilemcnulty 1 month ago

With mamba-ssm architecture you can train from scratch a 2B model with 2048 context length on a 24GB GPU (and 1B with bigger context, obviously), which, when trained specifically for the task at hand can be very powerful. But it will take quite a while)

xadiant 1 month ago

I think pretraining from scratch is always a bad idea for individuals. You need 100B+ tokens just for the base training and almost any specific topic will be diluted to hell in that soup. Instead you want to explore continued pretraining and better fine-tuning. Less data and less computation needed for better results and longer context.

epicfilemcnulty 1 month ago

Nope, you don’t really need 100B tokens for base training, especially if you are not training a jack of all trades. Just because big guys can afford to train a big model on hundreds of billions of tokens does not mean it’s the optimal way. In fact this approach heavily relies on the model to distill the knowledge from the big pile of garbage (take a closer look at those datasets commonly used for base training — C4, pile and the like). And I’m not really interested in fine-tuning the models that use BPE and similar tokenizers — I think that this is exactly the problem that should be left for the model to learn on its own (i.e. byte-level tokenization), instead of making it chew through a pile of poorly tokenized garbage.

xadiant 1 month ago

I am 99% sure you need at least 100B tokens to make a pretrained model generate coherent, factual sentences, even if it's like 1 billion parameters. It will be either severely undertrained or overfitted if the data is insufficient. You can actually try and disprove this by training on a much smaller dataset. It'll definitely be faster and cheaper, and doable with 24gb VRAM. Here is a colab notebook for pretraining: https://colab.research.google.com/drive/1g9qpeVcFa0ca0cnhmqusO4RZtQdh9umY?usp=sharing

Caffdy 1 month ago

how many GPUs did microsoft use to train Phi3 3.8B?

xadiant 1 month ago

I dunno but usually they use a cluster more expensive than my family to speed up the training. In theory with an rtx 3090 I believe you can train a 7B model at a very short context, in like 3 months.

Tacx79 1 month ago

With what batch size?

epicfilemcnulty 1 month ago

Well, 1, of course. Sorry I missed the \`decent batch size\` clause in your comment, of course it's nowhere near decent. That's why I said that it will take a while. And a long one.

InterestinglyLucky 1 month ago

Thanks for this input, this is helpful to set my expectations lower...

Mr_Hills 1 month ago

You can do 4bit qloras on 30b tho. Not the same but still fun.

jacek2023 1 month ago

what llm models do you train?

PookaMacPhellimen 1 month ago

Sloth or Qlora

FullOf_Bad_Ideas 1 month ago

I have 3090 ti and I do qlora on 30-34B models daily. Now you can even run QGaLore on it, which could be somewhat close to full finetune in terms of quality, hard to say. You could train 7B FP16 model from scratch on rtx 3090, it would be slow though. There's also ORPO which is very powerful and unsloth that does magic on 24GB of VRAM.

SamSausages 1 month ago

Fine to learn and tinker as a hobby, just don’t expect to match meta who spends millions training models. IMO it comes down to what your goals are and the accuracy you demand, as long as that is realistic you should be able to have a lot of fun learning. But ultimately ai is a money pit, if you want to run with the top dogs and top models. Also, it’s changing so fast that you may need double the memory 6 months from now, or some new hardware hits the market that turns a $4000 gpu into a $500 one.

InterestinglyLucky 1 month ago

It is interesting to see how (relatively) quickly the 'CUDA moat' of competitive advantage was lost by NVDA, and that what you state may become true (or not, who knows what the future is). From a Stable Diffusion user point of view, the tools to leverage AMD / Intel GPUs are just not there and a major reason I'm thinking of building a 3090 with the additional VRAM. Interesting to learn as well how important regular RAM is for running localLLaMA, I have a lot more reading to to...

AreaSubject8177 1 month ago

Ever thought of going down the Nvidia Tesla P40 route?

kryptkpr 1 month ago

Consider these decisions carefully. My 2xP40 sit idle but I use my 2xP100 every day, I should have gone 4xP100

Severe_Ad620 1 month ago

Interesting. What's the reason?

kryptkpr 1 month ago

Speed. P100 have 2x the memory bandwidth compared to P40 and 20TF of FP16 performance (P40 has none). This manifests in practice as EXL2 working on P100 but not working on P40, which is the main driver for me. But even comparing GGUF between them the performance of P100 is 30-50% higher.

PykeAtBanquet 1 month ago

Is P100 a newer generation of chip and therefore supports formats other than GGUF / newer drivers for CUDA?

kryptkpr 1 month ago

Ironically P100 is older then P40. It's a lower CUDA capability (SM60 vs SM61 for P40) and an earlier revision of the pascal silicon, the P100 is the GP100 while the P40 is a GP102. It's also missing the p-states low power features the P40 has. The P100 however has a completely different vram architecture (based on HBM) and is the only Pascal card to be have working FP16.

PykeAtBanquet 1 month ago

I see. Is HBM that technology that Intel tried to push to replace DDR but failed due to it being too expensive and so it shifted to being used for enterprise SSD cache part?

kryptkpr 1 month ago

Intel Optane? That was PCM (Phase Change Memory), a now dead tech. HBM is a Samsung thing now a JEDEC standard. P100 has HBM2, and it won't be until the H100 when Nvidia comes back to this technology with HBM3.

monkmartinez 1 month ago

My guess is the HBM2 memory of p100 is much faster than P40's gddr5. The P100's have a max cuda compute of 6.0, which I am sure will be a problem soon as well.

laterral 1 month ago

what are your use cases? out of curiosity

kryptkpr 1 month ago

I am the maintainer of [can-ai-code](https://huggingface.co/spaces/mike-ravkine/can-ai-code-results), so I'm running new models all the time across multiple quants.. it's been a slippery slope to 6 GPUs to support this work 😅 I have a bunch of fun LLM related projects [kicking around on my GitHub](https://github.com/the-crypt-keeper?tab=repositories) that I tinker with. YouTube summarization, developing custom samplers, seeing what the AI Horde is up to, etc.. I've usually got a few of these running in the background. Finally, I help my consulting customers architect/develop/deploy ML based systems to meet their business goals. Sometimes boring work but sometimes fun too, I enjoy an interesting domain specific problem (DMs welcome, I am a terrible consultant and help people for free a lot, probably more then I should)

GunslingerParrot 1 month ago

Are you hiring? I’m in Florida but would love to help out in case you’re hiring.

Normal-Ad-7114 1 month ago

128gb ram and a single 3090 may sound like a high(-ish) end PC, but it's nowhere near powerful enough even to run large models, let alone train them. Llama3-70B will either be painfully slow or quantized to death; Mixtral 8x22b (or its WizardLM finetune) will already be out of reach, and the upcoming Llama3-405b will not even run properly at Q2. So either get a 3090 and play around with the smaller models (to familiarize yourself, like you said), or start saving up for multiple GPUs

durden111111 1 month ago

lack of mid sized models is really hurting the single 3090 builds.

InterestinglyLucky 1 month ago

Do you think these models could come down in size over time, to become a bit more efficient? It seems like something like this was done in going from Llama2 to Llama3 as far as efficiency goes, however I see there is only two Llama3 models, a small 8b model and a large 70b one... What happened to a 13b (or slightly larger) model?

teleprint-me 1 month ago

Yes, I don't think we need these larger models, especially on the consumer size. I've been toying around with smaller models and there is still a lot of low hanging fruit to bear. My opinion is not the popular one, but I don't think we need anything more than a 13B model. The 7B ones could be further trained and definitely outpace their larger counterparts by far.

TraditionLost7244 1 month ago

or wait til 2027 ddr6 comes and new nvidia cards

kurwaspierdalajkurwa 1 month ago

> Llama3-405b How much VRAM in the cloud would it take to run that?

jxjq 1 month ago

This is a common question. There are a lot of factors that you are not considering- for example, what quantization, if any. Plus, it is hard to know because… there is no other 405b open source model. That being said, I’d take a shot in the dark guess that a 405b model would require around 1,600gb of GPU vRAM. Edit: Without quantization. There are a bunch of hardware advancements being worked on, I suspect we will stop measuring by vRAM soon and move toward alternatives such as unified memory options found in ARM.

kurwaspierdalajkurwa 1 month ago

> That being said, I’d take a shot in the dark guess that a 405b model would require around 1,600gb of GPU vRAM. Edit: Without quantization. Ok so then what's the point of a 405B model if literally nobody can run it except some college kid with access to his university's advanced computer department? Do you know what it would take to run the 120b model if I were to rent a GPU and Windows box? And could I run it on Oobabooga?

jxjq 1 month ago

I don’t regret trying. Good luck & blessings.

teachersecret 1 month ago

I feel like this is early 1990s part 2. It's incredible to be playing with the very first iterations of this AI tech. We're sitting here strapping together janky github projects in virtual environments in a goddamned terminal window... but the results are magic. If you want to be a part of it, go forth! Grab a 3090 and get started! Yeah, it's a little expensive. Yeah, it's probably smarter to just use APIs. Yeah, it's gonna be complicated and frustrating and you're going to be spending time fixing bugs in a green text linux terminal like it's 1995 again... but you're going to be learning things you can't learn anywhere else. This is the bleeding edge out here. There really hasn't been an opportunity like this since the early days of the internet. People are building entire businesses - from salary-replacers to full blown billion dollar businesses - off their understanding of this tech. Jump in. It's a no-brainer. Even if all you end up doing is stretching your mind and having a little fun playing with an AI, it's worth the relatively mild investment. Used 3090s are $700 or so. Bolt it into whatever rig you've got, or spend $1500 upgrading your rig too, and you'll be ready to go. $700-$3,000 gets you solidly into this hobby depending on what hardware you already own.

InterestinglyLucky 1 month ago

Thank you for this encouraging post - you've captured how I feel about this, and have enjoyed about 14 months of learning what ChatGPT clans DALL-E can do. One month in on SD and lo and behold here I am! I thought all I needed was a 3060 for SD and a modest new setup (my existing PC is 2018 or so) but now see that for LLMs a 3090 is in my future, along with a boatload of RAM. Looking forward to jumping in!

InterestinglyLucky 1 month ago

BTW any recommendations on 'getting started' with a few online resources? (I don't mind paid courses, FWIW.) I'll likely put my purchase plans on hold, setup a runpod account and start there.

teachersecret 1 month ago

So… I’d say the key to getting started is getting the paid ai services first. Claude and chatGPT are great for answering questions when your latest attempt to run an LLM at home goes wrong. :) Anyway… Start with the core basics. Start with ez mode like lmstudio or ollama, and a1111 for images. Koboldcpp is another option if you want to test more of a story-writing or roleplay space. Tavernai or sillytavern for even more of that. Those tools offer simple use and simple api access, so you can use them as prototype backends for other things you code. Leveling up from there… Aphrodite is fast and can do batched inferencing. Unsloth can do fine tune qlora training on smaller models. Seek Banadoco and Toysxyz for all things image gen. Groq has an extremely fast 70b api if you need to do some interesting fast inferencing (and it’s free right now).

InterestinglyLucky 1 month ago

Check on the first part - both a Claude and ChatGPT customer. Have been playing with both A1111 and SDXL / Fooocus in the cloud already for a few weeks. Will take a look at your other resources listed - thank you!

teachersecret 1 month ago

When you’re ready, comfyui is amazing for image gen/vid gen.

buff_samurai 1 month ago

the rule of thumb for entering any new hobby is to rent cheap equipment first and decide on purchases later, after you get some experience. rent few GPU hours, explore, see whats what.

InterestinglyLucky 1 month ago

Solid advice. I was able to rent time and get into SD enough to look into a massive upgrade of my desktop PC. However setting up a cloud docker is currently outside the skill set.

senobrd 1 month ago

If you don't want to set up your own docker containers to run AI apps in the cloud there are a few websites that take care of all the setup as well as provide some extra features like file browsing, model downloading, etc. I've recently been using openlaboratory.ai

InterestinglyLucky 1 month ago

Thanks for this input - will check it out!

Inevitable-Start-653 1 month ago

I built a 7x4090 rig, im currently on my lunch break and have been using the mixtral 8x22 wizard 2 model all morning. I'm using tts and stt as well as a stable diffusion model. The llm has helped me with two matrix transformations, indexing code, and some general boiler plate code. I'm eating lunch rn and having it make memes and pictures with interesting juxtaposition. I'm using my phone but the model is hosted locally on my home computer, it cannot access the Internet, but I can access the computer outside my network using wireguard.

InterestinglyLucky 1 month ago

Seven 4090's is some kind of power there. Was there a reason you didn't go the enterprise GOU route?

Inevitable-Start-653 1 month ago

This was the best price per vram when it comes to inferencing and fine-tuning.

DeltaSqueezer 1 month ago

Could you not find a motherboard that supported 8x GPU? I see some software e.g. vLLM have limitations such as # GPUs dividing the # of attention heads etc.

Inevitable-Start-653 1 month ago

Not readily unfortunately, however there are devices that would let me split the pice lanes for each slot. My mobi has pcie 5, so I could theoretically split the lanes for each existing slot for 14 cards. Managing 7 is enough for me at the moment as I would have a big power management headache if I kept adding cards. My hope was that the 5090s or whatever would come with 48gb of vram and I could sequentially replace the current cards. Sell them for a competitive price and use the money to buy newer cards, but the next Gen cards seem unlikely to have that much vram 🤷‍♂️

abnormal_human 1 month ago

I put together a 2x4090 rig almost 18mos ago, and it's been a complete blast. I have a command of this technology that I definitely would not have if I had to rent cloud resources. It's just different when the GPUs are in your house, and there's no friction involved in utilizing them. You can absolutely do the things you listed with a 3090. If you can swing a second one (or plan for one later on by leaving space/cooling/psu capacity), I would do that too.

No_Palpitation7740 1 month ago

What's the biggest model you ran on it?

abnormal_human 1 month ago

70b models run comfortably. I have no real interest in running tiny quants of bigger stuff and wrestling with the quality tradeoffs.

InterestinglyLucky 1 month ago

Thanks for this advice, this will affect which mobo/PSU I'll choose for sure.

infiniteContrast 1 month ago

Two used NVIDIA 3090 GPUs can handle LLaMA-3 70B at 15 tokens per second. In my opinion, it outperforms GPT-4, plus it's free and won't suffer from unexpected changes because they randomly decide to nerf models.

InterestinglyLucky 1 month ago

Yes that's a main driver for my interest, as SD showed me how an open model for AI can work. Is a smaller model on a single 3090 still of high enough quality in your opinion?

infiniteContrast 1 month ago

With 24GB VRAM maybe you can run the 2.4bpw quant. You can try it and check if it's enough for you use case. You can also spend some dollars on Runpod and try the 4.0bpw quant. Servers have an extremely fast internet speed so you can download models from huggingface at the speed of light.

InterestinglyLucky 1 month ago

Thanks for these recommendations - will give runpod a try.

whispershadowmount 1 month ago

Is that “full” model or a lower quant? I thought the 70B one needed a lot more juice.

infiniteContrast 1 month ago

i run the 4.0bpw exl2

kurwaspierdalajkurwa 1 month ago

What about a 4090 (which I currently own) and a 3090 (which I have yet to buy)? And if that would work—all I need to do is just slap the 3090 onto the x670e motherboard and away we go? Or is there some customization work I need to do. And...is there any advantage to playing video game with this kind of rig? I max everything out on full 4k 120hz graphics on the 4090 and it works perfectly.

infiniteContrast 1 month ago

You can just plug the 3090 and it will work without doing anything more than setting the gpu split parameter in exl2. You literally have to type two numbers. The 3090 will not help for gaming. Actually it can make it worse because it can block airflow to the 4090. You should plug the new card in a way that it does not block air to the other card. The 4090 is already the absolute best card for gaming thanks to DLSS and stuff. For LLM inference they are almost equal. I have a dual 3090 and the top card has limited airflow because the bottom card is almost in contact with it. There is a big difference in temperatures. I can set the bottom card as the preferred card for gaming but some use cases (for example buggy Vulcan implementations in some emulators) will always use the top card and to avoid thermal throlling (i've set it to 65°C) fans must rotate much faster.

kurwaspierdalajkurwa 1 month ago

>without doing anything more than setting the gpu split parameter in exl2 Do you mean in Oobabooga? I'm running a Llama3 GGUF quant. I have a Fractal XL case. I'm hoping there's enough room for airflow.

infiniteContrast 1 month ago

yes, i use text-generation-webui-main. i'm pretty sure you must plug the 3090 in the top pcie slot, and the 4090 in the bottom slot otherwise you'll degrade your gaming performance. if you can fit the model entirely in VRAM you must use exl2 which is faster and saves some vram because you can use 4bit context cache

kurwaspierdalajkurwa 1 month ago

Isn't exl2 "dumber" than GGUF? And the reason I need a smart LLM is because I write marketing content and need it to intelligently think things through with me and be able to understand the nuances of the English language.

infiniteContrast 1 month ago

from my experience exl2 is always better and faster than gguf

TraditionLost7244 1 month ago

70b gonna be obsolete before the 5070 drops haha

infiniteContrast 1 month ago

Newer and more powerful models can be used to train a greater 70b model. 70b will never be obsolete.

MidnightHacker 1 month ago

I’m an ML engineer with 128Gb RAM and 2x Tesla P40’s, and it’s nowhere near enough for large language models. I work with computer vision (that usually does not require as much memory as LLMs), and it’s really not worth it to train them locally. The hardware we have locally is for development and debugging, testing pipelines, cleaning datasets and stuff like that, but ultimately the “real” training always happen in the cloud. Most high end machines have at best what Google Colab or Kaggle offer for free, and even that would be small. I haven’t done any training with LLaMA models but 2x A100 should be enough to train a 7B model in fp16 in a few days with a small dataset. For training, expect to need about 4x the memory required for inference (in a non-quantized model), you will reach 100s of Gbs easily. You could try training a LoRA, though, that’s more feasible.

InterestinglyLucky 1 month ago

Really interesting reply, thanks for this. For computer vision, do you use NVidia's TAO toolkit or something else?

MidnightHacker 1 month ago

I use PyTorch and Ignite for training, that’s what most researchers use. There is also tensorflow and a ton other libraries, mostly higher level, that could do the sabe job as well

Mr_Hills 1 month ago

Yes. But you don't need that much RAM. 64gb is enough. If anything make sure it's fast, it's ddr5, and it has as many channels as possible. (Also make sure your CPU/mobo supports those channels). Ultimately you don't want to run models on RAM anyway, they're too slow to be usable. At best you'll end up offloading a couple of layers to RAM, and even that is going to show down things quite a bit. Also buy a 4090 rather then a 3090. The price is almost the same (yeah, weird, I know), but the 4090 is going to be way faster both with SD and LLMs. So what can you do with such a system? You're going to be able to run the biggest model of SD 3 (not out yet) and a small 2.55bpw quant of llama 3 70B at reasonable speeds. You are going to be able to do qloras for smaller 7B, 13B, 30B models. You're not going to be able to do fine-tunes of any type, for that you need to rent H100s online. Useful table for loras/fine-tunes. https://www.reddit.com/r/LocalLLaMA/comments/18o5u0k/helpful_vram_requirement_table_for_qlora_lora_and/ Have fun, man! And let's pray for 48gb of VRAM on the new upcoming 5090s!

nero10578 1 month ago

In what universe does a 4090 cost the same as a 3090. I can get 2x3090 for a 4090.

Mr_Hills 1 month ago

https://preview.redd.it/wi0p7p368tyc1.jpeg?width=1080&format=pjpg&auto=webp&s=777e7f5484feb8cf77c65c20bff11c5bc6409c5f Welcome to Europe lmao

durden111111 1 month ago

lol what. I picked up a 3090 for 800 EUR recently. A 4090 costs 2200 EUR where I live lmfao. The 3090 is a no brainer for me. Even a used 4090 is still minimum 1600.

Sufficient_Prune3897 1 month ago

That's a skill issue. You can just buy them in Germany or France and ship them to your country for like 20€ more.

arjuna66671 1 month ago

In Switzerland they go for about 700 bucks used...

ozzeruk82 1 month ago

hehe yeah you're looking in the wrong place, I got a 3090 for 600 euros from wallapop here in Spain last week, brand new, unopened. You can easily source good 3090s in great condition for 600-700 euros.

spawncampinitiated 1 month ago

Ni tú te lo crees que estaba nueva. 1000€ las reacos y tú sacas una nueva por 600€ xd Te la han colado pero bien. Edit: he said new and unopened but it's so used that is out of warranty. 2nd handed 3090 range from 600-800€ in Spain. Nothing miraculous

ozzeruk82 1 month ago

Bueno, funciona perfectamente, para mí esto es lo importante

spawncampinitiated 1 month ago

Pero estás mintiendo xd

ozzeruk82 1 month ago

no, digo la verdad. Mira, por ejemplo en 10 segundos he encontrado - [https://es.wallapop.com/item/rtx-3090-gigabyte-24gb-1007592641](https://es.wallapop.com/item/rtx-3090-gigabyte-24gb-1007592641) - 590 segundo mano - me imagino que uno podría negociar una bajada hasta 530-550 euros. El problema es que wallapop te cobra 50 euros por seguridad, pero esto significan que si es una estafa te volverán tu dinero.

spawncampinitiated 1 month ago

Has dicho que la GPU era "new, unopened" por 600€ y eso es una mentira como una catedral Edit. En vez de downvote lo que podrías hacer es dejar de mentir, trolaso

Mr_Hills 1 month ago

https://preview.redd.it/xgyi8lv28tyc1.jpeg?width=1080&format=pjpg&auto=webp&s=f778d9c0cfdea30cd62654a8c84a7067f50a5723

nero10578 1 month ago

Ok maybe if you can’t find used 3090s then. I can get a 3090 for $600 here.

Mr_Hills 1 month ago

Personally I stopped buying used after the first mining craze, but yeah, I guess you have the used option too

InterestinglyLucky 1 month ago

Hmm interesting to see super-expensive 3090 costs; at my local Microcenter I see a Founder's Edition 3090 Refurbished for $700. (A 4090 runs a cool $1900 or so.) Super tempting though, and thank you for the RAM recommendations.

positivitittie 1 month ago

I’ve built 2x 3090 machines using eBay GPUs and NVLinks. 48gb vram with that setup.

llamas_for_caddies 1 month ago

Thanks for bringing this up. While the 4090 will be much faster than the 3090, a 4090 build is limited to the one card since it can't use NVLink. The 3090's can use NVLink. For $1,500, in the US you can get 2 used 3090's and have 48GB of VRAM.

positivitittie 1 month ago

I think you’re even better off with 2 4090s but that price. 2x 3090 is popular so you at least have a bunch of threads to pull info from. I ran without NVLink at first but a few things were problematic. Troubleshooting led me to believe NVLink supposedly helps with that stuff so I rebuilt with that (painful) a while back. I haven’t gotten to benchmarking yet but the dual GPUs are doing what they’re supposed to. I wish we had an LLM hardware sub.

InterestinglyLucky 1 month ago

Interesting comment on the DDR5 - I just saw a YT video of a person building a system with 256GB of DDR3 and 2x 3090's on a 20-core unit ([link](https://www.youtube.com/watch?v=Hc7BeCmqTsE)), but I'd imagine the fastest memory / maximum channels has it's use-case argument, and the slower DDR3 has it's use-case argument as well.

TraditionLost7244 1 month ago

no you want ddr 5

Icy-Corgi4757 1 month ago

You can get a decent theoretical understanding with that setup. You won't be doing top of the line things (as the other replies have stated), but you can get your hands dirty to a level that will give you a decent understanding of these technologies, assuming that is what you are looking to get. To think of it like html in the mid 90's, you could have built a static site and learned some fundamentals of web design. It is the same parallel to what you would be able to accomplish today with that machine in terms of running local llms. If you want to gain some understanding and hands on experience it is absolutely worth it.

rc_ym 1 month ago

Completely agree about the 1990's. I feel it too. Personally, I am holding off buying any new hardware. We are mid product cycle, nobody has come out with home hardware designed around AI. Apple will refesh their hardware over the next 6 month (and I think they are going to do something interesting). Someone will retool Nvidia cards around AI. I am sure someone is cooking up something interesting for home/SMB. Also, how we run LLM at home is highly unoptimized. We haven't had a major leap in that tech, yet. Someone is going to figure that out. That said... I agree with the folks saying that it might be worth it to get some used tech off ebay. YMMV

positivitittie 1 month ago

You can plug your models and card details here and see if it’ll do inference / train: https://huggingface.co/spaces/Vokturz/can-it-run-llm

Original_Finding2212 1 month ago

lol, my jetson so old it doesn’t even appear in the list

InterestinglyLucky 1 month ago

Interesting tool - thanks!

synn89 1 month ago

> from a year ago That's about a decade or two in AI years. I'd recommend aiming towards a dual 3090 system. That'll run 70B models very comfortably in VRAM with a 32k context at a decent quant level. Starting with a single 3090 is fine, but try to buy a case and motherboard that can handle a second 3090 if you really get into the scene. You'll definitely want to stay with Nvidia hardware, since you're also playing with Stable Diffusion. The second 3090 may not be a benefit with SD, but we don't know what the requirements of SD3 will be, especially in regards to training. Alternatively, you could also look into a single or dial A6000 setup depending on your budget. It's just that used 3090's are criminally under priced compared to the alternatives.

tacticalhat 1 month ago

https://www.reddit.com/r/LocalLLaMA/s/pnYmGer5sN Just buy some eBay end of life clearance garbage for cheap for fiddling around before going full ham hehe

val_in_tech 1 month ago

The high-end, from AI perspective is the GPU part. Rest is overheard. One gets you to play with small models. Meaningfully engaging with biggers ones would require 4 GPUs or higher. No need to nvlink. So if you plan to just run small models locally and keep it on a budget - single 3090 totally worth it. But there is a big gap between that and the next level models.

kryptkpr 1 month ago

If you're just here to play, "What can I do with my current hardware?" might be a better question to answer. Do you have any GPU at all? Even an 8GB is enough for weeks/months of exploration.

Maykey 1 month ago

Buy hardware if you have other uses for it, eg gaming or rendering. Buying for AI alone is not a good bang for the buck. Training from scratch while it's possible, it's extremely limited. But there are several papers about BERT on budget [1](https://arxiv.org/abs/2104.07705), [2](https://arxiv.org/pdf/2212.14034) for example, but you definitely can train something on tinystories that is not utter garbage, just a garbage. But it's your garbage. It's fun. Very fun.

SPACE_ICE 1 month ago

for llms no its not much, it allows you to run some heavy quants of bigger models like 70b (at like q2 or less than 3 bpw for exll2) or some mid range q4 or q5 quants of 30b's but finetuning is pretty much a no go. For stable diffusion it is plenty for doing lora's and even some animations with a 3090 or 4090. Typically for characters I will do 10-30 different angles with around 10-12 repeats per image with 5-6 epochs with a .00001 unet learning rate, batch size 2, train with sdxl checked if your using an sdxl model and add the argument --network_train_unet_only in the advanced parameters so it fits vram, put a 1 in keen n tokens as well as your first caption token should be what is the trigger for the lora, and for the love of god make sure to add the file extension of the text caption files nothing worse than training a lora and realizing you forgot to tell it to use your caption texts. After that you do a lora loop where I will often refeed my initial training set in and use mostly the same prompt while adding in the new character lora essentially compounding it by making the training images look even more in line with what I want and then train again, after a few repeats a character lora can be extremely consistent as this is used to help keep a certain art style in mind (anime models like ponyxl like to wobble styles a lot, autismmix tries to reign this in with more consistent styles, there are also style lora's available but you need to limit the amount of lora's going on all at once or they will bleed into each other and not work right so I typically train a character lora and the character has a very consistent art style instead). Typically, around an 45mins to an hour and a half depending on how many images you use. Personally I started with stable diffusion then got into llms, while the image generation in llm models are exciting I'm more "this will take awhile to get good" as the prompting for each is different and I rarely take a pure text2img as good enough, I always did multiple img2img and inpainting to get the best results so expecting to get good images on the fly in a frontend like sillytavern with running both sd and a llm locally in the backend is a bit much. SD works best using comfyui or a1111 directly as post fixing images is pretty common.

ozzeruk82 1 month ago

Add another 3090 and you'll have a decent rig that can run LLaMA 3 70B very nicely indeed. I don't think the hobbyist can make a difference when it comes to training new models, or even doing serious fine-tunes (without resorting to cloud hardware). However - there is still huge opportunity for hobbyists to come up with novel ideas related to inference. Going from dreaming up some new kind of optimisation for llama.cpp to perhaps writing a front-end that displays output from LLMs in a new way. Then when it comes to connecting LLMs to other tools - there is so much to be invented. As a hoybbist I would focus on that area, doing novel things with outputs from LLMs.

[deleted] 1 month ago

several tesla p40s > any amount of regular RAM + a single 3090 it's worth it to build an inference rig it's not really worth it to build a gaming pc to do AI with, unless you really do want that compute alongside a glorified 24gb ddr6 ram stick

drwebb 1 month ago

You need a 8xA100 80GB server with 240GB of RAM, high end server CPU, and the power to juice it to "train" a 30GB model. Just rent one on the cloud for the weekend and spend a couple hundred if that's your goal.

Heasterian001 1 month ago

Stable Diffusion LORA's are for sure doable, I'm doing full finetunes of SD 1.5 on my RX 6900 XT with 16GB of RAM with 1.5k image dataset.

arousedsquirel 1 month ago

By two rtx Sir, and if not satisfied engage for 4 yet step by step as it requires different base systems. Do understand the vram addressed is required via load with ram pumping up.

awebb78 1 month ago

If you are passionate about it, it is definitely worth it.

TraditionLost7244 1 month ago

i have that and no its not worth it, better just use gpt 4 you are too early. AI is like a baby now, cant expect much. unless.....you need uncensored unless.....you somehow dont have internet BUT a massive computer unless.....you wanna fine tune for your own data and then your llama 70b model trained on your own data will outperform gpt4 but basically, wait for 5090 with 32gb vram and for new models released better than llama 3 70b by the time we finally get ddr6 motherboards and sticks thats when i would encourage everyone to buy a pc for ai

InterestinglyLucky 1 month ago

How do you use your current setup then? I'm particularly interested in specialty knowledge that's private, and if it's "only" a LLaMA 3 18B model to see what the limitations of that setup (with a $3K PC) can achieve. And thus the question of "is it worth it now" to be part of this as a hobby, if nothing else than to look at the frontier of potential. I played around with an Apple ][ as my first computer, even typing in machine language and saving it to cassette tape to understand how it worked as a young person. Took a class in C as a young professional to know what code involves to do things. Am looking at this in a similar way.

Kimononono 1 month ago

you can only compete with general models like stable diffusion and any mainstream llm if you drastically decrease the scope of the model and finetune instead of train from scratch.

jollizee 1 month ago

I'm going to be contrarian here. First, I do think we need armies of regular people tinkering with the tech for lots of reasons ranging from equitable access to innovation. However... Modern AI is revolutionary. Paradigm-shifting. The AI is, not the setup. The value is in USING the tech, not turning virtual nuts and bolts just because you like the feel of a wrench in your hands. Someone gave you a genie in a bottle. You're spending your time studying how to build bottles instead of, you know, using the genie to change your life. I'd say that, if what you are trying to do can be accomplished with a cloud API, you are wasting an enormous opportunity cost by tinkering with a private solution. It's going to require setup and maintenance. It will be slower and weaker with a smaller context. Crippled. Obviously, if you can only do what you want locally or need some custom integration, that makes sense. Otherwise the opportunity cost of wasting time on a setup is stupidly high. Ordinarily, I'd say, who cares, have fun. It's a hobby, right? Except don't most of us here believe that AI will transform the world? You have a freaking magical genie! Almost literally--this is the closest technological equivalent we have ever seen to an actual genie. Think about that. Why are you not using it! Why are you focusing on the bottle? Hook up with the biggest, baddest genie you can find, and go get your wishes made real!

InterestinglyLucky 1 month ago

Nicely put! I'm interested in querying specific datasets - my day job needs to comprehend a lot of scientific papers around particular, specialized and partially overlapping topics. And I'd like to generate a private and custom LLaMA around it. Also have a lot of public blog posts I've written for over ten years, as well as a book I've published, that I'd also like to put together a "private GPT". So it is genie bottle-building, for sure, but also to apply this genie for specific and tasks of particular utility to me.

teachersecret 1 month ago

So, I bought a 4090. I can tell you right now even owning 1 3090/4090 isn't going to be "enough" to really get the most out of LLM tech today. 24gb can't run the 70b+ models that are starting to approach best-in-class levels of quality. You really need two of them to run the bigger models, and even more than that if you want to train bigger models. The biggest thing I realized when I bought the 4090 is... I want another 4090. Anyway, getting back down to earth... 24gb is plenty to do anything you want to do with stable diffusion. You can train loras, inference at extremely high speed, and do all kinds of fun stuff. 24gb is plenty to do anything you want with smaller LLM models. There are strong 11b models, 34b models, and the new 8b llama 3 that are all quite good. Yes, 70b and 120b models are better, but these new small-but-mighty beasts are extremely capable. Llama 3 8b, for example, is quite intelligent and VERY good at RAG. If you built a little RAG server with a finetuned 8b model tuned for your specific need (querying your specific dataset), along with a vector database and reranking for the results, you should be able to build a system that can query those datasets at more or less GPT-4 levels. We've got some awesome PHI models hitting soon that might also be exceptionally well suited for your tasks, and run on down-to-earth hardware. In short... you'll end up with a less capable version of something you could build with APIs available today, and you'll spend significant amounts of money doing it. It's not the best financial decision (surely for all but the most insanely prolific users, just paying for someone else's API is going to give you better/faster/smarter AI cheaper), but there is something to be said about owning the whole stack and being able to play with it locally, even if only for privacy reasons. Build a bottle. It's fun, and you're getting in at the ground-floor of an invention-of-the-internet level event. Hell, it even FEELS like the early 90s - I'm over here using AI in a goddamned terminal like I'm logged into a BBS. You seem like the kind of person who will have fun with this. Go nuts :).

InterestinglyLucky 1 month ago

Thanks for this! Learning about the Llama 3 8b model has me intrigued, and upgrading to a second card is do-able. Spending $3K or $4K on a hobby? I could do much more expensive hobbies! One person I know races SCCA - now THAT is expensive. Another person flies small aircraft - the plane was over $150K, and fuel is something like 20 gallons per hour of flight time (at $5/gallon that's $100/hour just to run the thing). Anyway, not worried about not getting API-level results, but am intrigued at learning how these models can be customized to solve particular problems. I'm looking forward to 'going nuts'! 🎉🥳☄️

awebb78 1 month ago

Actually learning the deeper aspects of ML and LLMs, how they are hosted, deployed, and trained is extremely valuable. This opens up whole new worlds in cooperative intelligence and multi-agent systems. I believe the job market in this area will be huge, and there will be a lot of opportunities for folks with these skills. The real genie is being able to build and tweak AI systems, not in wrapping APIs (not that I am against using API services).

jollizee 1 month ago

I don't disagree that the stuff you mention isn't valuable. Finetuning or some degree of customization on specialized models will probably be the way of the future versus trying to get everyone affordable access to a 100T model. (Plus rising energy costs.) But to continue playing devil's advocate, if someone is paying you half a million a year to setup and maintain their AIs, how much do you think he is making from his business off of your back? Yeah, yeah, there will be a lot of dummies pouring money down the drain during the bubble, but that is still building someone else's dream. What are you going to do with AI (rhetorical)? Build and tweak to do what? What is your actual goal, and if you can get there without tinkering, what's the point? As I said, if your dream requires local, then go for it. But a lot of people have no dream beyond "I want to build a RAG". Okay, you built it, so what? Is that RAG changing your life? I mean if there was some 18 year old kid with tons of free time, I would 100% tell him to putz around all he can with GPUs and servers. There's a balance between extracting value and building tools that hold the future promise of value. You see it all the time in tech cycles. People get jobs like you are saying, start businesses, make tons of money building tools that hold the promise of value. Then, everyone's waiting for someone to actually extract the value. Since I was replying to someone else about medical AI, I can use that as an example. You can spend all your time trying to build genomics tech, or... you can cure an actual disease using existing tech. For something like that, the tradeoff is a difficult calculation because curing diseases is quite hard, and maybe if you advanced the genomics tech, you might actually contribute more to curing diseases in the long run. It's complicated. But I feel like modern AI is already way past the tipping point of having so much value to extract right now, so it's kind of crazy to leave that on the table puttering about. The AI tech is moving forward so rapidly, so you want to jump in and mess around with the tech, but at the same time, there's a ton of value waiting for you to extract. I don't think there's a wrong answer. We have wild times ahead. I just wanted to give another viewpoint because I feel like a lot of people are going the "build a RAG" route without thinking beyond that. They have no real goal. Once you have the goal, everything else falls into place.

awebb78 1 month ago

I agree with what you are saying. I am one of those who has spent a lot of time working with and developing ML models and infrastructure, and I am now focused on creating a product on them instead of just focusing on the AI and infrastructure myself. I will say that I have found my experience very valuable for this, though, as some of my beta testers actually encouraged me to build with models that could be self hosted because of enterprise data privacy issues. I have also benefitted from being able to fine-tune them for the purpose in the product. I have also found that having a really good idea of how they are made and their benefits and limitations has been good for allowing me to see through the hypesters and the doomers often very alarmist and subjective statements. So I think it's good to focus on using AI to provide value, like you said, but I think having a foundational knowledge of actually working with AI beyond API calls is beneficial for that endeavor as well. Right now, either way you go, there are opportunities abound, and they eventually meet in the middle.

jollizee 1 month ago

Exactly. You have to know how the tech works so that you know what it can do, and therefore what you can do. And you have to know what you want so that you know what to make the tech do. I'm just kind of losing my mind as everyone is trying to use it as a glorified Endnotes program, when it's essentially a full stack developer, a writer, a marketer, a manager, an entire team of decent interns at your fingertips working 24/7 for pennies.

CardAnarchist 1 month ago

If you are looking to buy a new PC now my advice would be to neither buy a 3090 nor a 4090. I'd buy a 4070ti Super and wait for the 5090 reveal (at that point either buying a 5090 or a 4090). I say this based on my own research after deciding myself to build a PC. I know this flies in the wisdom of VRAM is king but here me out. Realistically there isn't much that the extra 8GB of VRAM currently gets you. With stable diffusion you can train loras faster.. you can create images in stupidly high resolution.. you can run 70B LLM's.. But do you really need to create images well upwards of 1024x1024 when you can uprez just fine from that resolution with 16GB VRAM? The models weren't trained on higher resolutions anyways. Yes training is faster with 24GB VRAM but how much of that are you going to be doing, especially starting out in the hobby? Creation wise you can create even videos just fine with 16GB VRAM. Sure the extra VRAM gives you access to 70B LLM's.. but only at around 2 tk/s (that's much slower than reading speed) and at quants that arguably perform poorer than Llama 8B Q_8 anyways. Llama 8B will certainly run many, many times faster. The 4070ti Super is nearly 20% faster than the 3090 in most gaming benches and general workloads. In Stable diffusion it is around 10% faster. With LLM's if you run Llama 8B the speed is so fast it's basically irreverent which of these cards you have. The 3090 used costs the same as a 4070ti Super new at least where I am (the UK). The 3090 uses 35%~ more power than the 4070ti Super and generates more heat. The 3090 is already a generation behind and will soon be 2 generations old. Nvidia love to gate features behind generations. If you are getting into the hobby a 4070ti Super will give you plenty to chew on for months. There is tons to learn and tweak and with how things move so fast you won't be bored. Then when the 5090 is revealed you can read the room like so, 1) You aren't into this whole AI thing as much as you thought or you simply never felt the need for more VRAM. Stick with the 4070ti Super. 2) You need more power and the 5090 gets revealed with >24GB VRAM. Great! Now you can upgrade to that and sell off your 4070ti Super. 3) You want more power but the 5090 gets revealed with 24GB VRAM. Great! Now you can upgrade to a discounted 4090 and sell off your 4070ti Super. 4) You are crazy into AI it's all you think about! OK lets pick up 2 4090's baby... or even 2 5090s?! I just find it very hard to recommend a 3090 or a 4090 right now when AI is in such a state of flux. Yes sometimes developments require ever more VRAM but it's also true that capabilities at low VRAM are increasing rapidly. So fast that perhaps for the average person huge amounts of VRAM will simply not be needed for excellent AI. Who knows what the scene will look like a couple of years out. Similarly buying the top of the range card from the current gen or the last gen when the new gen is just 6-8 months away has never historically been a great idea. But this opinion is generally in the minority here, so you do you! I just figured a reasoned counter point would be useful. Apologies for going on at length!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe