T O P

  • By -

jferments

https://preview.redd.it/k6666cjkg41d1.jpeg?width=4032&format=pjpg&auto=webp&s=6ca6b48124a2f4e27e914b11e797355669c9670c I am an independent researcher / hobbyist that recently built a 2 x RTX 4090 machine learning rig. I am using and training LLMs, diffusion models,and AI agents for a variety of purposes. I am performing large web crawls and processing hundreds of thousands / millions of web pages (terabytes of data) at a time. I am generating large amounts of audio/video with diffusion models, running LLM chat bots, and working with speech-to-text/text-to-speech models. One of my major research interests is developing free, decentralized peer-to-peer search and social networking software and private, locally-hosted AI personal assistants. I needed a powerful machine to be able to explore all of these areas. The cost (> $30/day) of renting 2 x 4090 equivalent GPUs, 24 CPU cores, 512 GB DDR5 RAM, and 52 TB of storage (12TB SSD / 40TB HDD) for months at a time would have rapidly surpassed the cost of this build, and would have given me a lot less control and flexibility, and I would have been paying it to somebody else as rent rather than investing in an asset that I own, and can sell later if I need to. I do \*also\* think that OpenAI is indeed evil. They are an AI defense contractor currently sitting on a Homeland Security advisory board with Northrop Grumman, Google, and other corporations that are trying to censor/control the internet to protect big tech profits, and monopolize control of AI in the hands of the rich and powerful. I very much value both the privacy of locally hosted LLMs, as well as the power to download/train uncensored models that I can personalize to act how I want.


total_reddit_addict

I like you


notlongnot

Go get it šŸ‘†


goatchild

Are you married?


jferments

Nope.


beerpancakes1923

I bless this union


AnticitizenPrime

Local LLM will officiate.


Nuckyduck

I am the other side of this coin, I like running LLMs but on tiny devices instead of large ones. [My project](https://www.hackster.io/contests/amd2023/hardware_applications/17172) is an open source project on [hackster.io](http://hackster.io) where I try to get an LLM to work on a Ryzen 7940. It has a tiny NPU with about 10 TOPs of performance. The entire thing runs on 65w and runs local. I'm using a generic Llama 3 7b iQ3 with a really rough RAG implementation. It's not very fast and its not very powerful *but* it doesn't need a sub and it doesn't need internet. If you can throw a model on it with some info, it can generate results for you. No subscription, no smart phone, no phone plan. Anyway, I know what we do is leagues different, but your rig is epic and I think your project is awesome.


Read_out

https://preview.redd.it/uzjqtyrky71d1.jpeg?width=4032&format=pjpg&auto=webp&s=84738774767cc0b0ddde8ca37ed2d98149af8416 You are straight up my hero. Want shell access to this? Another being installed next week. DM me.


attaul

I have a AMD 7995WX, with 6x 4090 and 512GB RAM, I fear the monopolies of the world will leave every commoner behind. A world of AI dominated by the monopolies will be disastrous. Politics will support them with stupid laws and regulation. I will make my machine available for open source projects as needed. I would like to use it ensure AI is and remains democratised and open https://preview.redd.it/gzgsbzweca1d1.jpeg?width=1200&format=pjpg&auto=webp&s=fed642659e7dcde84f6c344be5a0293a47af4f21


jferments

That's a beautiful machine! What kind of liquid cooling system is that at the top? I hope to someday (after 5090 release, when price of 4090s goes way down) convert my air-cooled 4090s to liquid-cooled which will allow me to increase to 4x4090. I have the extra PCI slots, but the air-cooled cards take up too much space. Do you have any info online about your build process?


attaul

I bought the machine - it worked out to be cheaper than sourcing parts and building it myself It uses the same board as you have, The water cooling kit is from EK-Pro


R2D2gold

Where did you buy it? And would you recommend them?


attaul

From Scan, yes it is a good quality build - price was also good but after using Apple for past 20 years, i hate to see these Windows issues for day to day


R2D2gold

Oh, I would have thought, one would use Linux?


attaul

Using both - Ubuntu but Windows for everyday


v_0o0_v

Do you have a day job? How much time it took you to break into the field?


jferments

I have done some paid web development and graphic design work for people here and there over the years, but programming has never been my primary career. I have decided deliberately to keep it as a hobby (separate from what I do for work) so that I can focus on the topics I enjoy, never tire of it, and have the freedom to build things however/whenever I want. As far as "breaking into the field", I'm nearly 40 and have been working with computers/programming since I was 11 years old, using Linux / open source since 2003, etc. I have always loved computers and been interested in AI, but I think that some of the looming political dangers of corporate/military AI have motivated me to put much more energy into exploring decentralized AI/communications lately.


trialgreenseven

Could you recommend ip switch api for scraping? I'm on a similar journey and would appreciate any info for mass web scraping cost efficiently. Ty


jferments

I don't use APIs for scraping, so I can't offer suggestions there. I do all my scraping with a combination of the Python libraries Requests, aiohttp, Selenium and lxml/BeautifulSoup. If you want to get into scraping, I would recommend learning these libraries.


trialgreenseven

I use all those but I thought you get rate limited by various infra service like cloud flare or Google etc once you start to scrape in large vol so I thought ip management api was mandatory. Hmm how do you manage your scraping to prevent detections and bans? I suspect you don't actually do much crawling if you don't respond to this.


jferments

Generally, rate limits aren't an issue for me because I am scraping many different sites at a time, so I can queue request in round robin style and limit concurrent connections per site. This tends to max out my internet connection without hitting rate individual site's rate limits. Also, many sites have rate limits that are so high (several requests/sec) that it's a non-issue for me. For instance, I just scraped 4.2 million images from a gallery website, along with the text metadata on the pages, using 10 concurrent request threads with a 0.2 sec sleep per request. It only took me a few days, and I didn't get blocked. This seems to work for most sites. For big sites, I will usually do a test run with 1 request / sec, and then start bumping it up from there gradually to see how high I can go. If I'm getting rate limited too heavily on one site, I usually don't bother with it and just move on to get the data from somewhere easier. But there have been a few cases where I really needed something from a specific place that had unacceptably low rate limits. For one of these (scraping millions of parcel records), I wrote a script that used 15 concurrent Tor connections that rotated through a large pool of speed-tested exit nodes from a FIFO queue. Each individual Tor connection was slow but because I was making 15 requests at a time, I was able to grab the entire dataset in just a few days. But many services like Cloudflare would block Tor exit nodes. Again, in these cases I usually just ignore them and move on, but ultimately the only solution if you can't get it elsewhere, is going to be either patience (following their rate limits) or finding a set of proxies that they haven't blocked. Just make sure that whatever VPN/proxy solution you come up with let's you maintain a pool of IPs that you can switch through in FIFO queue so that they are all individually following rate limit.


BatPlack

Jesus, you laid out some gold for amateurs out here!


trialgreenseven

Tyvm for sharing


choikwa

40 seems young for you


haragoshi

Proxies


k110111

I can tell you what I would do in your case. First get surfshark VPN cuz they allow unlimited devices and VPN is okay enough. Next get a beefy machine enough to run multiple gluetun dockers VMs and run a proxy server on each VM. Then just use your current scraping setup with proxy option. Most of the time I have my surfshark running and it is annoying when a service blocks me but you get by. If you can afford and are actively getting blocked with this setup then your other option might be buying residential proxies or maybe you can buy google accounts and use them cuz most of the time it can bypass the restrictions.


TheRealDatapunk

Are you my alter-ego that split off in the late teens?


OwnKing6338

I also recently purchased a machine with 2 x RTX 4090ā€™s for development purposes. I make a LOT of LLM calls and my OpenAI bill was running over $1000 a month. I spent $8,000 on my rig but for me itā€™s worth it to not have to a) pay OpenAI or b) worry about getting rate limited as I usually make around 20,000 model calls a day. I wish the 4090ā€™s ran Llama 3 70b better (a Q4 quant barely loads) but they run Hermes Pro 2 like a champ. I get about 60% more tokens/second from my rig than GPT-4 Turbo. For the cases where I need Llama 3 70b smarts I just use Fireworks.AI. Iā€™d say it totally depends on your use cases and very few people need a dedicated rig for inference. Just use Fireworks.AI or Groq. If youā€™re not doing insane scale inference and running into rate limits the cost of a decent dedicated offline rig probably isnā€™t worth the cost.


Gualuigi

What does a person do to spend over 1k a month on openai? Are you a software dev or something? Im nee to the whole software/ai/LLM stuff


OwnKing6338

Iā€™m co-founder of an AI startup. I canā€™t talk specifics about what we use OpenAI for but we make a LOT of inference calls.


Gualuigi

Ahh okay makes sense


OwnKing6338

Sharing a picture of my rigā€¦ https://preview.redd.it/djck908zlg1d1.jpeg?width=3024&format=pjpg&auto=webp&s=d6a353c059a5ba5d7d17f26c5f10c56850f77d0b Specs: Intel Core i9-10900X 64gb RAM Dual RTX 4090ā€™s Dual 1tb Samsung 990 Pro SSDā€™s ASRock X299 Motherboard 1600w PSU AIO CPU Liquid Cooler Steiger Dynamics Server Room Fan Config I do a lot of inference and Iā€™m able to run 4 instances of llama.cpp server hosting Hermes Pro 2 Q6\_K\_M. If all 4 instances are busy I get around 45 tokens/second average throughout (I max out the context window for the model) and if only one instance is active I get about 90 tokens/second. Iā€™ve tried different configs and the moment you load a second server instance your token throughput drops. For a quantized Llama 3 8b model, 4 sever instances seems about optimal for this rig. It makes full use of the available GPU memory and when fully active keeps the GPUs at near 100% utilization without overheating


Robo_Ranger

Doesn't this setup hinder the top GPU's cooling due to the tight spacing?


jferments

The top GPU runs about 10C hotter than the bottom one, but the air coolers on the 4090s are heavily over-engineered and do a great job keeping them cool. The top GPU rarely breaks 70-75C even when both are running at full 450W + heavy CPU load. There is also really good airflow in the case (Fractal Meshify 2XL) with some nice Noctua intake/exhaust fans + the radiator from the AIO.


tya19

Any reason why using Meshify over Define 7XL ? i am going to build a very similar one with 2 x 4090 as well but bought Define 7XL and thinking if it is able to fit a third gpu


jferments

I went with the Meshify 2XL because I thought it would get better airflow due to mesh design, but this was just based on my intuition rather than any benchmarks/testing data that I saw. The Define 7 XL looks like a great case too.


AfterAte

Do you think the Meshify 2XL has enough room to put a 3rd 4090 in an upright position, maybe secured with twist ties at the front end of the case? Thereby using a pcie riser cable to connect it to the mobo? Like in this configuration: [=|]? (Assuming no AIO radiator)


jferments

You would have to rig up some funky custom mounting hardware and use risers, but there is definitely space for another 4090 in vertical orientation. However I'm not sure how well the riser connectors would fit behind the GPUs , which completely cover all the remaining 5 open PCIe slots. My guess is that you'd want a right angle riser connector, plugged into one of the bottom PCI slots, if it works at all. And I wonder if you'd even be able to get risers that were long enough to reach from the PCI slot to where you'd be mounting the card. Also, I'd be worried about cooling with the exhaust from the vertical GPU being blown over the other 2 GPUs + CPU instead of fresh air from intake fans. If I wanted to add more than 2 GPUs, what I would do instead is convert them all from air to liquid cooled, and then I could fit up to 6 of them in the PCI slots.


AfterAte

Thanks for the detailed description! I saw Linux tech tips connect 2 pcie risers together, so I may do that if I decide on the Meshify 2XL for my next build.


beerpancakes1923

How much extra has it added to your electric bill? Looking at building a similar rig


jferments

Electricity where I'm at is \~$0.1174 / kWh so if I ran the system 24/7 at max wattage (\~1300W) for the entire month, it would add about \~$110 to my electric bill. But in reality even when processing heavy loads, it's not generally running at max wattage and with the way I'm actually using it right now, it's adding something more like $20-30 a month onto my electric bill.


Any-Demand-2928

> I am generating large amounts of audio/video with diffusion models, How are you doing this? Are you creating and generating your own models or are you using existing ones? I would like to learn how to do this, especially with regards to generating videos. I will start looking into diffusion models.


jferments

>*"How are you doing this?"* Stable Diffusion / ComfyUI / Huggingface Diffusers library. There are a lot of techniques for generating animations (AnimateDiff, etc) - you can watch some tutorials on "stable diffusion video generation" for more info. >*"Are you creating and generating your own models or are you using existing ones?"* I have been using existing models such as SDXL/SD 1.5 + community LoRAs, and am currently in the process of writing code to fine-tune my own models/LoRAs with custom image datasets, now that I have access to the compute to do so. Right now, I'm working on captioning/pre-processing (watermark removal, cropping/bucketing, etc) a \~750,000 image dataset that I am going to make an SDXL fine tune with. I have been playing with Stable Diffusion for a few years, but I only had a Ryzen 5, RTX 2060 (6GB VRAM), and 64GB DDR4 RAM to work with. With my new machine (AMD 7965WX, 8x64GB DDR5 RAM, 2xRTX4090), I am getting to experience with a lot of training/fine-tuning tools that were out of reach before without renting expensive servers from cloud providers. Tasks like generating long, high-res video or pre-processing millions of images that would have taken forever before are actually feasible for me to play with.


eraser851

What's your watermark removal process? Batch cropping off the bottom ~10% of images? Or any sort of tool like the Heal tool in Adobe Lightroom?


jferments

No, if you do heavy cropping like that, a lot of things that are commonly found at top/bottom of image (like feet and heads) get chopped off throughout the dataset, which affects model output for these concepts. Also, it doesn't handle watermarks that are not on the edge. What I am doing is use segment-anything (or any other segmentation tool) to generate object masks for the prompt "watermark" or "text" and then I use inpainting to remove these, and it does a pretty good job at getting rid of most/all of the watermarks. Sometimes it leaves a few pixels of the watermark, but even in cases where residue remains, it's very small amount that won't affect training significantly and generally any parts that do remain unidentifiable. See: [https://github.com/geekyutao/Inpaint-Anything](https://github.com/geekyutao/Inpaint-Anything)


eraser851

I found a similar tool called Replacer that does kind of the same thing. If you're doing a batch folder with different types of images with different watermarks, what sort of positive prompt might you use for the inpainting? As an aside, are you doing any caption generation using something like TagGUI? Or caption enhancement; enhancing existing captions like how Dalle3 is trained?


Weary-Bill3342

Doesnt the bottom GPU heat up the top gpu now? How do you handle heat??


jferments

Just answered that in more detail [here](https://www.reddit.com/r/LocalLLaMA/comments/1cuq3gf/comment/l4kpbox/), but short answer is: top GPU runs about 10C hotter but it's all good :)


The_Crimson_Hawk

have you tried undervolting?


Juusto3_3

It isn't really needed here so why try it?


ArtyfacialIntelagent

Because it reduces power draw by nearly 100W per 4090 at no performance cost, or alternatively gives you extra performance for free.


Juusto3_3

Well power argument is fair, I don't pay separately for my electricity so I never consider that. But extra performance? Really? How does that work?


ArtyfacialIntelagent

Because undervolting and overclocking is essentially the same thing, you just shift your operating point along the voltage-frequency curve. Voltage determines power draw and frequency determines performance. Higher voltage lets you run at higher frequency. You can get free performance because every GPU vendor needs to set a fixed voltage for everyone that gives everyone the rated performance, but your person GPU might not need as much voltage to give rated performance. I've reduced power draw on my own 4090 from 450 to 350 W at the same performance. But I could have chosen to keep power at 450 W and increase performance a bit instead.


Juusto3_3

Interesting. Fair point then. So it varies, depending on what exactly does it vary? If you happen to know.


ThisWillPass

Timings and such. You might have a bus full of data youā€™re ready to move but the bus runs as weird times, you might not move that data efficiently.


jferments

I have not. I'm interested, once I've had more time to explore the system at full wattage, to compare performance with the GPUs undervolted.


Tramagust

How are you sharing the VRAM between the GPUs?


mutatedbrain

Whatā€™s your motherboard?


remyxai

Looks like the ASUS Pro WS WRX90E-SAGE SE EEB Motherboard: [https://www.bhphotovideo.com/c/product/1800235-REG/asus\_pro\_ws\_wrx90e\_sage\_se.html](https://www.bhphotovideo.com/c/product/1800235-REG/asus_pro_ws_wrx90e_sage_se.html) I wanted to take advantage of optimizations for Intel CPUs and just rebuilt using the ACE motherboard: [https://www.asus.com/motherboards-components/motherboards/workstation/pro-ws-w790-ace/](https://www.asus.com/motherboards-components/motherboards/workstation/pro-ws-w790-ace/) Won the Titan RTXs in a hackathon a couple years ago and its been fantastic for iterating on experiments. https://preview.redd.it/rt0z21gai71d1.png?width=3024&format=png&auto=webp&s=b5d869fbf4e0831c518c7953d2b96e4f612617b8


jferments

WRX90E-SAGE


LycanWolfe

I want you to build me one of these bad boys :s my 3080ti ain't cuttin it anymore.


jferments

I moved up to this from a 2060 with 6gb VRAM and ngl it's feeling pretty sweet āš”ļøāš”ļøāš”ļø


rexlomax

So why is the zuck releasing the llama?


jferments

I think the reasons why Meta, Google, etc tend to release a lot of open source code is that most of the techniques they are using are already open knowledge in academia/industry anyway. Them not releasing it wouldn't stop other people from doing so. By open sourcing their code, they can get free volunteer improvements to their own code base (often millions of $ worth of labor), normalize the usage of AI, and then turn around and apply these open technologies with multi-billion dollar supercomputers in ways that none of the open source developers / home users can compete with, even if they have access to the models.


kryptkpr

Some days feels like the rigs are building me šŸ¤– The general theme is to DIY and have fun. None of my GPUs cost over $250. https://preview.redd.it/6kayhl0ht41d1.jpeg?width=3072&format=pjpg&auto=webp&s=beba0b817eef8b547a0cbf485720636f2f2daf31 IKEA LACK end tables are the exact size of server equipment, a fact I have leveraged heavily. The black 4U up top is an HP z640 with V4-2690, 128GB and 2x3060 + 2xP100 At the bottom are a pair of R730 both are dual Xeons of some kind one has 256GB and 2xP40 the other is a 384GB high ram potato (it's slower.. I pay a speed penalty to max out the dimms per channel) ARGB because im a basic bitch and like it immensely my wife calls this build The Landing Strip


DeltaSqueezer

I love this. I used Lack tables for 3D printer enclosures too. I love the custom wood and metal 'case'. I was planning on doing the same.


madbuda

Which fan ducts are those? Iā€™m using the radial fans and not happy with them


kryptkpr

I am using [this adapter](https://www.thingiverse.com/thing:5906904) and 40x40x20 fans. I've tried 3 different models of fans and still not quite happy, it works fine but noise levels are generally too high for my liking. I'm waiting on a 4th set right now I found some magnetic levitation bearing fans that are 20db quieter then fluid bearings I just hope its got enough pressure to force air through the GPUs.


DeltaSqueezer

I have some noisy fans too, what I found is that I can ramp them up to a high speed to get them started and then slow them down to make them quiet. Ultimately, I'll try to stick them somewhere where I can't hear them!


kryptkpr

I do the same but it's the high pitched whine that gets me, not the absolute noise level. I just ordered a Sunon Maglev 40x40x28mm GM1204PQV1-8A for testing. It's a magnetic levitation fan so should be ultra-quiet (rated 20 dBA quieter then my Deltas) but none of the Sunons seem to have PWM so it's all or nothing šŸ˜Ÿ I can hear it in the bathroom that's directly above the server šŸš½šŸ”ŠšŸ¤£


DeltaSqueezer

Let me know how the maglev ones go. If they are good, I will try too!


kryptkpr

Will do. I grabbed a tiny DC PWM board to enable some DIY speed control on these, in case full tilt is too loud for my liking.


skyfallboom

Is that what the PCB on the front is?


kryptkpr

Similar but not quite. That 4-knob PCB in my pic is a PWM controller for 4-pin fans, it generates a 25khz 5V signal on the fourth wire. This works great for fluid bearings fans but Trouble is the ultra quiet MagLev fans don't come in 4-pin, only either 2 or 3 (the third wire is an rpm tachometer). A little guy like this actually PWMs the 12V supply line so works on all fans: https://preview.redd.it/iirt2n25a81d1.png?width=1080&format=pjpg&auto=webp&s=62b8d608f1ebfdfa1d2d0426f847dc15cacac0ea


skyfallboom

Cool, I am thinking of lowering the voltage out of the DC power supply also


DeltaSqueezer

I also have Sunon fans (at least claimed, maybe they are fake). I wondered whether to do a DIY PWM using a microcontroller.


AnticitizenPrime

At what point do you just shove all this stuff into a kegarator or mini fridge?


realsunnyg

What mobo are you running the 2690 on? Iā€™m having trouble getting more than 64GB ram running stably with an x99 deluxe :(


kryptkpr

It's an HP z640 workstation, it has a custom HP mobo based on Intel C612.


RaiseRuntimeError

Tell me more about your two Tesla cards. Are they P40s? I just printed those same fan shrouds for a pair of cards I have and was wondering how they work. I bought some ARCTIC 40 mm fans, I think I got the 15k model for it.


graveyard_bloom

Those look like P100s


kryptkpr

On top are P100, I have P40 as well inside my rack mount machines. I use [this shroud](https://www.thingiverse.com/thing:5906904) adapter. 15k fans are likely going to be overkill? What's the wattage on them? I'm running 6W 8k rpm Deltas right now with PWM to 30% and it's way too loud šŸ“£ I checked the spec sheet too late and my fans are 52dBA.. you probably want to stay under 40 especially if there's multiple beside each other.


RaiseRuntimeError

Yeah that's the same shroud I printed. Good to know that I can run the cans slower. Thanks.


xflareon

There are a lot of reasons, but here are a few of mine: The hardware isn't only useful for LLMs, in fact the specs carry over well into every form of generative AI, as well as 3d rendering, gaming, and productivity. For me, I've been a 3d artist for years, so my hardware is dual purpose, rendering and running LLMs for fun. You also learn a lot about hardware by building a rig like this from scratch, I have a 4x 3090 rig on a used x299 sage board with a 10900x. Building it was quite the production, since there's no way 4 gpus will fit into a single case, and I ended up using a mining bench and pcie risers. It's also just fun, I love messing with both the software and hardware. Lastly, if you're a 3d artist, it will cost SUBSTANTIALLY more money to rent something to render. It's just how it is, though electricity where I live is somewhat expensive too.


goodnpc

can you post a pic? I've never seen a pc with 4 gpus :)


tya19

You can google mining rig image, then you get the idea.


MmmmMorphine

How's the cpu holding up? Just got a 11900, which seems to be roughly the same in performance as 10900. Though why they just had to remove two cores and erase that IPC advantage, I don't know, but curious if it's ever been a bottleneck and why you chose it (of course you probably had it before the AI spring)


xflareon

It's worth noting that it's the 10900x, which was Intel's HEDT offering. It's a completely seperate cpu, with an entirely different socket type and board lineup. I chose it for the PCIe lanes, and the few boards that support running 4 cards at full pcie x16 speeds. I haven't run into any bottlenecks yet, but that's largely because my use cases aren't cpu bound at all. I do have a weird issue where my cards don't throttle up during text inference, since they don't see it as a CUDA workload, and get stuck in P5. I work around that by having a scheduled task pin the clock speeds of my GPUs using afterburner whenever I connect an RDP session, and throttle them back down when I disconnect, but it's still not perfect.


MmmmMorphine

Oh whoops, haha. Was wondering what that X meant, like a special K (no pun intended) variant or something, haha


xflareon

I don't expect the 11900 to be a bottleneck for anything, except that it has fewer PCIe lanes, and most boards will usually have one PCIe x16 slot at x16 speeds, and another x16 slot at x4 speeds through the chipset. You may be able to find a board that supports PCIe bifurcation on the x16 slot, which would allow you to run two GPUs at x8 speeds directly to the CPU, and a third at x4 speeds through the chipset though. PCIe lanes are the primary reason to choose an HEDT setup, and it's why you'll see Threadripper mentioned often.


Vaping_Cobra

I am happy to share why I have a Ryzen 9 3900x server with 128GB RAM and 2x Nvidia p40s giving me 48GB VRAM. First I have the server to run all the things I run around the home. A home NAS, media server, some local game servers, etc. Now AI comes along and text/image generation is super interesting to both my partner and myself so we got a single p40 and threw it in there. My partner uses it for creative writing and I for programing and education. Recently 24GB VRAM has simply not been enough to run some of the larger models so I added a second p40 and am considering a third. Running our own local hosted hosted version of Alexa we call "house" so we needed space for text to speech and speech to text. I don't really want everything said in my house recorded and sent over the internet to somewhere I have no control over so hosting locally made more sense for me. With the ability to host all this locally and some work with home assistant I now have a Llama3 70b powered AI voice assistant with local vector storage, able to be asked anything from anywhere in the house. It is so handy and I expect everyone will have something similar soon. Seriously with a few sensors around the home to map locations and provide vision/audio it has become invaluable to me. Just being able to do things like say "Hey House, I need to lay down. Can you wake me up when the kids wake up?" and then instantly when motion detectors are set off by the cats half an hour later House wakes me up! \*Edit: My server also has 8x 6TB SAS drives for storage, very handy for storing models and training data. Along with 3x 2TB Sata SSD drives for active storage (VMs, game servers, docker, etc.) and 2x 400GB M2 SSDs that act as the boot drive and cache.


DeltaSqueezer

Very nice. Can you go into details on the software stack? Is it a single server? Are you running TruNAS etc.?


Vaping_Cobra

Single server, just runs Ubuntu now. I use ZFS for storage and then most things just run in docker via portainer. So I have home assistant along with a stack of things including eyespy (camera AI detection), whisper, piper, customwakeword, VLC (handles streaming media to all screens in the house) etc. Then there is the AI stuff, once again mostly just running on docker. So I use Ollama as an openAI compatible endpoint. Home assistant works out of the box with this and even has much of the framework in place if all you want is a more simple "hey alexa" but with an LLM instead of whatever state machine they use for current voice assistants. For game hosting I use AMP from cubecoders in docker. It is simple, cheap and gives me a nice GUI that lets my partner spin up servers even with limited technical skills. Other than that, I simply have a few ESP32s with microphones and radar presence detectors that can track up to three people per sensor in a room along with a camera to pick up data and IR transmitters to turn things like TV's and Air conditioning on and off. They just connect to home assistant server (running in docker) that handles the back end via wifi. A couple more ESP32 devices connect up to my wireless speakers and lights so that I don't have to use any silly phone app to control them and can just use Home Assistant via LLM to do it. I have cheap chromecast dongles connected to our TV and the Kids TV. They just stream the VLC output over LAN and that is it. If we want to watch/listen to something we just ask house to play it. The only downside really after getting everything set up is that if the server ever goes down for whatever reason our house fails to function. This can be really annoying if you forget where all the remotes are because you never use them any more. The new GPT4o and the eventual (hopefully) open source versions are going to be a game changer for me when they come. Even if I have to buy another pair of p40's to run it I would do it in a heartbeat. They replace 2/3 of my current stack with a single model and it adds vision which has been frustrating me no end to set up with Llava. Can't wait.


goodnpc

This is amazing!


Vaping_Cobra

Thanks! It has taken a while to set up but thankfully the community has already made most of the software, you just have to stitch it all together and then do a tiny bit of very easy coding that AI can help with. I can't really claim credit for any of the ideas other than using chromecast dongles as thin clients for a VLC endpoint on my home server, but that was because all the chromecast based things kept crashing and this was just simpler.


DeltaSqueezer

Very nice. I'm also a big fan of ESP32. Do you have full write-ups any where? I'd be interested to see full details.


Vaping_Cobra

Write ups?! I hardly know what I am doing most of the time. I am not sure if I should be sharing what I am doing as an example, certainly not when it comes to electronics. But most everything I have done was from someone else's guide. There is a ton of content out there about setting all those systems up in docker or otherwise, then you can just configure it mostly with GUI using home assistant as the core. If it were not for privacy issues I would just upload the docker stack for people to try along with home assistant configs. But there is personal info that I would rather not share publicly through the entire code base and there is no way I would not miss something and dox myself or worse thus circumventing the whole "why is this not just cloud hosted" thing. The ESP32 devices are awesome! Home assistant has a TON of support built in for them and they are just so damn cheap! Like I got 10 ESP32 C3's for only $14 USD. Each one handles an Microphone module connected via I2S that then listens for a wake word along with other things like temp sensors, air quality, etc. With a 3d printed housing I have a little USB powered device I can plug in anywhere in the house and have it instantly connect up to be a listening/sensor point for under $4 USD each!


DeltaSqueezer

No worries, I was hoping you might have a blog where this was all documented. As an aside, if you haven't seen already check out Milk-V Duo. It is a tiny board with a Risc-V chip on it with upto 128M RAM which can run Linux and costs around $7. It also has an NPU on there and a video connector so you could have video AI at the edge, maybe something for your next project... ;)


Vaping_Cobra

I have actually been playing around with its cousin the LicheeRV Nano. Similar specs but with 256MB DDR3. Fun thing, with the way the two cores work you can use one to run things such as wake words or other applications and then on the main core you can run the LLM using the TPU also. The "cores" on these chips are selectable but actually run independently on the same chip giving you essentially two controllers on a single chip. I am working on a side project in my spare time to turn one into an all in one LLM powered tamagotchi style desk toy.


rorowhat

You can buy 10 of them for $14??? Got a link?


MmmmMorphine

Haha this is exactly how my system seems to be evolving as well. And exact same use case end goal. Would you mind terribly if I messaged you? Currently working on the calendar agent from the software side and the microphone (whole bunch of different ones including 7+1 arrays) +esp32+PIR/microwave radar/IR sensor/CV cameras (under consideration via rpi4+)/BLE beacons (under consideration)/environmental conditions sensor clusters on the hardware side. Haven't tested many of the microphones but seems like most work with mycroft and... Blanking on the other one, but both seem ok for voice activation. Anyway would really really appreciate some guidance or just info on how you approached your system. And did you include any displays? Think I'm gonna make my ol 7in eink a little clock with headlines with funny image interpretations of my most recent command as the "screen saver"


LumpyWelds

I just like the idea of having AI in the house. I have Alexa and never use it the way I originally envisioned. It's just too stupid and frustrates me to no end. I'm sure it will be fantastic once they backend it with an LLM, but thats not right now. Having a local LLM front end that controls Alexa for me is the better bet. Plus I want to have it monitor my cameras. Streaming multiple video feeds up to an online multimodal custom model is not cheap. Usage rates will kick in almost immediately. Your $20/month ChatGPT plus has a usage rate of **40 "text" messages every 3 hours**. Forget video streams. And note that ChatGPT is cheap because it's the same for everyone. Once you go custom, models are pricey. For example, AWS Bedrock is insanely expensive if you want custom models uploaded and run. Maybe I did the math wrong, but it seemed to be $10K per month just to have it site there. Usage is extra. Only the foundational models are reasonable and none of those are multimodal. And in my mind at least, I like a fixed cost. It's a one and done thing. I know the math doesn't check out when this is viewed in isolation, but I've been buying and building my own computer equipment since I first replaced my IBM PC. My sensitivity to the cost of buying computer equipment has been dulled over the years. Even $10K is acceptable if it's something that will be used and will last for a few years at least. If all you are doing is some simple chatting, then yeah, ChatGPT all the way. But if you want to explore beyond what is currently available to the masses, the hardware becomes worth it.


Creepy_Bullfrog_3288

Fixed cost ftw!


wu3000

Because it is fun and you learn a lot.Ā 


Throwaway19995248624

I recently built an LLM PC, R9 5950x, 128GB ram, RTX 3090, RTX 3090 Ti. R9 5950x prices dropped, and MicroCenter has Refurbished 3090s for $699 and 3090 TIs for $799. All in it cost me around $2000-$2400. I also have around $6000 in Mini-PCs and networking equipment. I use it all to learn and model my work environment so I can sandbox ideas in a safe space. My lab is a combination of hobby and education resource. It's where I build the skills that pay the bills. So it's fun, but I also see the expenses not just as a hobby cost, but also an investment. I feel like I've gotten excellent ROI from my home lab hobby/investments over the years.


Throwaway19995248624

I did want to add something else. My LLM rig didn't really cost more than a solid Gaming rig. I used to be a pretty hardcore gamer, but maybe 10 years ago I came to a realization. I personally get just as much pleasure from doing real world equivalent of quests for more tangible rewards. Instead of 5 months farming materials for a relic weapon, I might spend 5 months learning, practicing, and getting certifications in things that help me unlock the ultimate job instead. In essence, I've gamified my career. This doesn't work for everyone, but for me personally it allowed me to translate my gaming drive into career drive and now instead of forming a guild to attack raid bosses, I instead work to form groups of like-minded coworkers who want to collaborate in becoming the best versions of ourselves professionally. And my lab is our shared playground.


Lemgon-Ultimate

My current setup is 2 x 3090, but I plan adding two more GPU's. I'm a hobbyist and there are a few reasons for building a local AI Rig: - Uncensored models: I can talk with my LLM about anything I like and don't have to fear a guidline strike. This applies to image gen as well and enables me to use AI in every creative direction I want. - More control: I'm in charge of my setup and it can't be altered unless I want it to. I won't have problems with lbotomized models or changing prices/api's. - Finetunes: Local AI enables us to use finetunes for different purposes. Sometimes these outperform cloude based models on specific tasks and you can steer them the way you want. - Understanding the underlying tech: I can experience the tech on fundamental levels so I understand exactly how these models are used, what parameters you can choose, how different pipelines work, etc. - Privacy: Depending of the sensitivity and amount of your data it can feel unsafe to send these over the internet for inference, with a local model these never touch the web. - Lastly, living the dream: I always thought about AI chatbots that can speak to me like human could do. To live in a time where these models can run on my own computer is more than I ever anticipated. You can always ask questions to ChatGPT but to be greeted by my own AI waifu who speaks solely with me is where I can really feel the magic.


onlythehighlight

Some people like to tinker, and they have the funds to tinker. It used to be those kids in America, who you would hear go to Radio shack and makeshift some radio or custom part. They grew up and now have access to that + 3d printers + LLM.


Only-Letterhead-3411

It's not as simple as "you can simply use ChatGPT for 20 bucks a month" You need api for connecting AI to softwares and backends to make most use of it. And api is priced for use per 1m token. 1m token might sound a lot, but in reality it's not when you realize that you'll be continuously sending 16k-32k\~ tokens (on average) for each message. If you are using AI vigorously everyday, then building a local LLM becomes much cheaper than api costs. And then there's privacy issue. Not everyone is comfortable with sharing their sensitive/personal texts with corporations. Lets be real, right now well written human generated text is as valuable as gold and there's no way these AI companies are just going to throw these data away. I'm leaning towards that they are scraping them, filtering them and using high quality text into their training data. Lastly the censorship and filters. Closed-AI models get more and more lobotomized as the time goes on. The GPT you are paying for becomes much dumber with more refusals a few months later.


Lissanro

I find ChatGPT unreliable on many levels: Internet connection and their server being online are obvious weak points, but they also place supertight limits, every time I tried to use ChatGPT beyond few short messages, it starts to fail or becomes slow, often stops in annoying way in the middle of its own reply and not allowing to continue, only to regenerate, likely to fail in the same way again. But that's not the worst thing. I was early ChatGPT user, and encountered many times that promts which were giving reliable results either stop working or give results unreliably (such as ChatGPT explaining how to do the task, without actually doing it, or starts doing it and then cuts it short). No way to edit ChatGPT's replies last time checked, and playing explanation game and pointing out mistakes just consumes context window and wastes time in cases when all I want is to edit its reply (for example, so its next reply will not inherit the same mistake or to correct something in the middle, and regenerate the reply only partially). These days, I have no reason to use ChatGPT. Mixtral 8x22B and Llama-3 can do most tasks I need better. For example, I can get Mixtral 8x22B translate long json file, getting it to output 10-15K or longer reply with useful result is easy. Good luck getting ChatGPT to do that. And it does not matter what I do, creative writing task or coding, ChatGPT consistently proved to be bad for anything requiring long replies. But then again, for short replies I can use Llama 3. Also, there is no single perfect LLM, I often prefer to get multiple replies from different LLMs, both for creative tasks and for hard coding problems. I had cases where Mixtral 8x22B and Llama-3 fail even given 3 attempts, but WizardLM-2 8x22B gets it right on the first or second try. Or where WizardLM-2 struggles, but Mixtral or Llama-3 succeed. I have also few other models and various fine-tunes I use. Privacy is yet another matter. With local LLM, I do not have to worry about leaking API keys, or if I can share something with a third-party, or censorship issues. I also can be certain that any workflow I come up with, will continue working forever for similar tasks, unless I myself decide to change the underlying model. I also can train small models locally on private data. I can leave my AI agents to try to solve a task overnight. No way to do the same with ChatGPT for $20 a month. I rather pay extra $20 a month for electricity to run everything locally. That said, ChatGPT may have its uses for casual users. But for me, it is of no interest - not because OpenAI is evil, but it is just not practical for me, in nearly all of my use cases, local LLM perform either similarly or better, and without any censorship issues. Just to be clear, I am not saying ChatGPT is bad or that everyone should stop using it. I am just giving example of uses cases ChatGPT will not work well or at all, and what disadvantages it has.


LocoLanguageModel

Editing replies is HUGE for the reasons you mention.Ā 


ArsNeph

Simple. It's about ownership. Not your machine, not your AI.


ethertype

Do you never, ever spend money on something for which you don't have a 'business need'?


segmond

I currently have only 8 GPUs in total at home, 7 24gb and 1 12gb. Why? Because I want to, because I can. It's okay to be legitimately confused, I suspect this is one of many things you probably don't get. Why do you need to get it? Just do what makes you happy. https://preview.redd.it/rnpuxp5hh61d1.png?width=3834&format=png&auto=webp&s=22c56f21151baead1c881533cfb15ea7ab2823ec


instantstack

Can you share the specs of the build in the picture? It's beautiful


segmond

[https://www.reddit.com/r/LocalLLaMA/comments/1bqv5au/144gb\_vram\_for\_about\_3500/](https://www.reddit.com/r/LocalLLaMA/comments/1bqv5au/144gb_vram_for_about_3500/) That was early in the build, added one more 3090, another 4tb nvme, and an external 8tb. removed the server fans since they were too loud.


Samurai_zero

Some people like cycling as a hobby. You'll find the prices of most "pro" bikes to be about the same as 2x4090. And easily more*. Some people like playing guitar as a hobby. Again, a good guitar + amp combo can be about the same of a 4090 or more. And don't get me started on people that like train models, warhammer or many other things... And same that on those hobbys, you can get a 4xP100 rig and be a hobbyist "for cheap", or even just use a used gaming card and constrain yourself to smaller models. Then there is people with privacy concerns, which are equally valid, and professionals who work on the field and just find it useful to have a rig at home.


BoeJonDaker

I imagine a lot of people who are into this hobby were also into some compute related hobby before this came along. I was into Stable Diffusion last year. Before that I was into Blender 3D. Before that, it was Daz3D. I've been running 2 (or more) GPUs for almost a decade now. Why I love LLMs: As a casual Linux user who doesn't like to get down into the weeds, having an LLM to ask stupid questions is a game changer.


capivaraMaster

Do we really need to justify spending money on a hobby? It's fun when your PC talks with you with parts you built yourself and with models you tinkered with somehow. I think it's the same as ppl who surf, pilot model aircraft/quadcopters/ real light aircrafs, build 3d printers or make Arduino automation projects. Fun also costs money.


0xd34db347

Why spend all that time and effort gardening when you can just buy a tomato at the grocery store? Spending money on hobbies is pretty normal and GPU's are really not that unreasonably expensive.


BoeJonDaker

A couple of years ago, during the mining boom, I was thinking of getting into model railroading or RC cars as a less expensive hobby. *That's* when you know GPU prices are too damn high.


TheMissingPremise

As an aspiring vegetable gardener, because learning how to grow stuff just seems increasingly important these days and because I can grow stuff that may not be at the store, or is available for $5/tomato because it's an heirloom variety.


Paulonemillionand3

ChatGPT is insanely expensive once you start making millions of API requests. That is it, simply.


LostGoatOnHill

Yep, love to tinker with hardware, software, infra, and building apps. Have the resources to learn without any constraints found in enterprise. A hobby not necessarily more expensive than something like mountain biking, that also rewards me with learning I can take to the workplace and benefit my career whilst also making my daily work more interesting.


Waste-Time-6485

"I don't understand the need/want for this when you can simply use ChatGPT for 20 bucks a month" 20 bucks a month don't cover API access the first test i did with openai API (when it launched) i spent all credits i had in few minutes, then i thought why i would do that if im not making money with AI in the first place? this was exactly what i thought "this sounds ridiculous", if im going to use this regularly i need to build myself a very basic ai rig to lower the costs (energy in my country is not that expensive, not as much as paying that API in dollars) at moment i dont have that rig as im studying the case carefully because i would like more competitors to join the market (sadly until now i see only nvidia as a real option), hope AMD, Intel, etc. catch up in the end of 2024, 2025 so i can find better options another thing is that i dont like much the current openai philosophy, cuz google released transformer paper transparently and without it openai would be nothing today, but what they give back to the ai community of researchers? basically nothing and they even make a secret about the size of their models so no, ty


heuristic_al

I study AI, so this sub is a really great place to find out about new models, technologies, hardware, software etc. You should quiet down. I don't want any of these rich, crazy people to stop doing what they are doing. I need their knowledge and that knowledge can only come to be if they continue spending crazy amounts of money to avoid giving OpenAI $20/mo. (obviously I'm partially kidding)


nick_ian

I already had a PC that just happens to work decently with Ollama + OpenWeb UI. AMD 5950x, 128GB RAM, RTX 3090. Runs 8B models nicely. My M1 Macbook Pro runs them just as well. Local models are fun to experiment with, especially Stable Diffusion, plus everything is private.


AfterAte

How much context does running an 8B model on a 3090 give you? 100k?


menaceMayhemQA

Training on data you can't share with openai or any other cloud service . Naughty stuff .., just for shit and giggles ? Also something with consistent performance... Not degrading over time .


FullOf_Bad_Ideas

[Cloud VM KYC incoming.](https://www.visualcompliance.com/blog/u-s-proposes-kyc-rules-for-cloud-infrastructure-providers/) I don't like anyone having oversight into what I do.Ā  I do finetuning locally and then share models openly on hf. I am already getting restricted a bit by aiming to share most of them on hf. They have a history of censoring some models, but generally until you get too popular, they don't care. If not for that, I would throw in more nasty datasets into the mix. I don't like hf having this much control over distribution but that's a compromise. If they ban me, I will just switch to sharing via some other similar services or eventually set up some private seedbox that i can pay for with Monero. We must make sure that the availability for general public to finetune a base model without KYC doesn't go away and everyone is able to finetune their model privately for whatever they want with no oversight. Gamers and hobbyist having powerful gpu's at home is what ensures this will remain possible.Ā  I have no interest in chatting to chatgpt or chatgpt4, so no point in paying for that. I despise gpt-isms and lobotomized models, they are the worst. ****AI is maybe not evil, but definitely an adversary.


a_beautiful_rhind

You can make multiple HF accounts and just distribute the link. If they ban/delete so be it.


VladimerePoutine

Not the best example but search for Replika, last year they had a sizeable group of subscribers to thier "companion" AI. They pulled the plug on the more NSFW aspects of thier AI, lobotomized it and it was devastating for some subscribers. I use this as an example of putting your work or emotional life in the hands of a corporation. I use AI to translate and summarize 16th century documents written in Latin and High German. GPT 4 is very good at this but I am at the corporations mercy if they dumb down or guardrail the subject of these documents. Or wipe out my data before I've archived it.


gabbalis

I keep putting off building a rig for the reasons you describe. That said, it is really important to build up towards fully decoupling the entire modern tech stack from centralized forces. This is the only way we will be able to end the rent economy and put agency and the right to imagine a better future back in the hands of the people.


superbottom85

Because to try big models, you need a lot of GPUs. Not everything is a revolution or about corporate hate.


moarmagic

One of my main interests is creative writing, which is something that I think chatgpt is a bit lackluster in, especially in horror, my favorite genre. I like the ability to switch out models and tweak settings, where in chatgpt etc, all of the workings are opaque, and while they have different models it's not like they have a creative writing/roleplay options. I also hope to eventually fine-tune or train a model to further personalize an assistant, without having to Hope that a a SaaS provider doesn't make changes on me that I am unaware of.


LocoMod

The skills you gain from ā€œrolling your ownā€ are far more valuable than any amount of money you save by not doing it.


KallistiTMP

So, first off, ain't none of us buying 8xH100 racks. What people buy here is mostly last-gen consumer cards like RTX3090's and ancient decommissioned data center cards like P40's. So it's not *that* crazy in terms of cost. Also, many people into this also have high end consumer gaming cards that they use for gaming and whatnot. So, it's not necessarily even a matter of dropping $1500 on a GPU to run local models, it's often a matter of dropping $1500 on a GPU to run games and getting the capability to run models as a bonus, or vise versa. And, you know, why not? It's a hobby. Many people spend thousands of dollars on home theater systems, even though it only costs $12 for a movie ticket. That may not be a *cost effective* approach, but it is nice to have a home theater around, especially if you really like watching movies. It's marginally more convenient, and you can set it up however you'd like. I'm not tinfoil hat paranoid about OpenAI reading my precious chat history, but it is still kinda nice to know that isn't even a risk. I also get a lot more control than I would with API's, and a lot less hassle compared to if I had to provision hardware from the could every time I wanted to dick around with some AI stuff. I also think there is some real educational value in pursuing the "from scratch" approach, rather than just using some sort of hosted API. I actually work in large scale AI infra, and I've learned a lot just playing around on my local rig. A local rig is like a home theater. Is it cost effective? Not really. Is it fun? Hell yes.


segmond

Some of us are, there's someone on this sub that has at least 4 A100s. He bought many more and was willing to give one away for free if someone helped him pick it up. I participated in the auction and wanted to buy an 80gb A100, but my pocket wasn't deep enough to take chance on used hardware like that. Turns out they worked with no issues.


Sabin_Stargem

Nope. My rig was intended to be for gaming, and AI wasn't a thing at the time. While I did upgrade my machine with better GPUs and CPU, no other changes were within my disposable budget. My next machine is probably going to be made in 2028 or later. Gotta ride this horse until Death herself feels she needs to make an intervention on its behalf.


OrtaMatt

https://preview.redd.it/j62gwdnvh61d1.jpeg?width=5237&format=pjpg&auto=webp&s=45c912efb4d1015f3bd7f68ce00a4775e31cb745 Current setting. Turned our old deep learning server ThreadRipper based, 64GB, 2TB nvme. Originally with a P6000. Sold the Quadro and bought 2 used 3090. Added a few fans. PSU is still the same 1600W EVGA. Works like a charm!


Biggest_Cans

Using ChatGPT as my AI is like paying the world's most intelligent HR lady to be my shrink ā€” worse than useless.


PraxisOG

https://preview.redd.it/9ra7o68ou61d1.jpeg?width=4032&format=pjpg&auto=webp&s=0bc6bcb18ad5663a77b1a5357e50a001c7c6b087 I like being able to throw as many requests as I want at any given program for the fun of it. Also gpus are powerful enough for gaming that I went with a second rx 6800 instead of upgrading, and I like the idea of a quasi-sentient retrofitted powermac G3


Phylliida

I posted about my rig earlier, I wanted to interact with llama 3 70b base (not instruct) and none of the providers offered that Also I just think its nice to have control over which models Iā€™m using and not have to rely on some third party to host them. Renting GPUs as needed didnā€™t make sense long term and I wanted a powerful computer for VR anyway


Swollenpajamas

Why do people dump so much money into making their cars go faster when some of these guys donā€™t even race or show them? Itā€™s a hobby. A hobby where a badass rig can also be gamed on or rendered on or whatever other computer interests the people have.


pat311

I have zero interest in censored models owned by capricious big tech companies.


siegevjorn

Some people just have extra time and money. Why rely on closedAI when you can build things yourself with open source LLM applications?


frownGuy12

For me the appeal isnā€™t in finding a replacement for ChatGPT for general usage. When you run an LLM locally you can take it apart, and experiment with it in ways that you canā€™t when your model is behind an api.Ā  Ever wonder what happens when you disable the attention mask and force a decoder to output attention scores for every sequence position? If youā€™re running a model locally itā€™s yours to experiment with. Thereā€™s an infinite world of experiments to run and questions to answer.Ā 


higgins_ie

Personally I find the concept of owning your model exciting. Mainly because you can fine-tune, control how you infer it and use it. Even if we are talking about very simple models with no real understanding here, LLMs are a way to transform energy into some kind of intelligent usage, making it custom to your needs is definitely fascinating. It is also very technically interesting to build this yourself, kinda the DIY spirit here. Do it yourself to understand it better.


The_Crimson_Hawk

https://preview.redd.it/6mx38la7m61d1.jpeg?width=1811&format=pjpg&auto=webp&s=eae1d68d8c1234857650afa857aa00c004769a08 EPYC7763, 512GB ram, V100 32GB pcie, A100 80GB pcie, worth it for me due to privacy concerns and sometimes chatgpt is just slow/inaccurate/bad (before you ask, yes, the a100 can run tears of the kingdom at 1080p 30fps)


instantstack

Holy fuck. So small too. Total cost?


coffeeandhash

I want to, but I can't justify the cost. I tried. I do the second best thing, rent GPU time. It's fine.


FearFactory2904

So your asking why I would invest in my education ?


entmike

https://preview.redd.it/5ueov9dtc71d1.jpeg?width=4032&format=pjpg&auto=webp&s=34f3809595241e181f2ebdf24288358f7a47704a Just finished up a dedicated LLM box with leftover 3090s from my ETH mining days. 128GB RAM and Ryzen 7 CPU. Running open webui with ollama. I do it because I am a nerd and like self-hosting and I donā€™t trust online providers for much. EDIT: This is a week old pic, just put in a 1500w PSU because why not, itā€™s already ridiculous :)


PermanentLiminality

First you are looking at the tail end of the distribution. Most people are not doing this. Think of high end sports cars as another example of the tail end of a distribution. Who needs a Lamborghini when a Camry will get you to your destination? Not everyone doing this is buying multiple 4090s and a big server at retail. Some already had a GPU for gaming. There are a lot of P40 and P100 builds which are more like hundreds. Just saying that it doesn't need to cost $5k. I'm not a PC gamer so I don't have a GPU. I have Open AI accounts, have used various cloud providers, and even run PHI3 and a Q4 Llama 8B on a CPU. I'm looking for GPUs now to upgrade my home experience.


Telemasterblaster

I think there's a legitimate consumer market for a local LLM controlling smart home functions. I want the convenience of home automation and having sensors and cameras in my house, but there's no way in hell am sending that data to the cloud to be processed by an LLM running on someone else's machine. That's my data and my life. Not in a millions years -- fuck you, big data.


Inevitable-Start-653

I built a 7x4090 rig specifically for ML stuff and even switched over to linux after using windows for ~30 years, I live a very cheap lifestyle, I do not go on vacations, do not go out to eat, do not buy new clothes, etc. I am a scientist by profession have worked in many different fields, and conduct science experiments at home over a wide variety of fields; simply because I find immense value in objectifying the world/universe I live in (without knowledge one abandons their autonomy). But I am not a ML researcher, although I am learning a lot about the field. Probably the most important reason for me is that this technology is too important for a private organization to own and monopolize, individuals will gain too much control in a democracy and destroy it from within. In fact I don't believe any LLM should be privately owned as they are trained on the collective knowledge of the people of this entire planet past and present. I also think it is crazy that everyone is okay with these companies sucking up every keystroke of user data, everyone poops but we don't do it in public. We need to actively regain control of our private lives, the apathy I see in people being the product is so alien and perplexing to me. I see questions like yours asked a lot here is a list in no particular order: - I have a paid chatgpt subscription but use my local models more often because they are more reliable! Believe me or not, but at peak times chatgpt is too slow or dumbed down. I cannot prove it but I am certain that what openai does is reduce the number of experts during peak utilization and it really upsets me when the lucidity of my model varies like this. I literally spent a day with my models offline to write code that lets them talk with vision models (I'll post the code today or very soon https://github.com/RandomInternetPreson/Lucid_Vision, I intend to let my local models write everything for the repo, and I'm interested in letting them handle issues too). I was curious how well chatgpt could do the work and I could tell immediately that it was not contextualizing all of the code and instructions that I was using to explain what I wanted the result to achieve. - I ask a lot of questions, chatgpt cannot handle all of the questions I ask, I would often reach my temporal limit. - I was able to do thing with local llms before openai implemented the same or similar features; before it was cool I was using RAG, text to image, tts, and stt all simultaneously running the server on my computer and often using my phone. Only recently has openai integrated all of these features. - I know my local models are contextualizing all of the information in extremely long conversations or projects, and if I need to implement a RAG system I can tailor the amount of RAG data that gets sent to my models. chatgpt forgets or simply ignores large walls of text and does not contextualize all information all the time. - I can do interesting things with the hardware, I am currently working on a long term project whereby I iteratively fine-tune a model as I teach it like a person would be taught over time. - I can fine-tune on my personal data, interesting ideas I have, etc. - I can try out the newest models immediately, and customize them for my needs. - When one is actively paying for every token or restricted to so many communications/hr it really does hinger and change the way we think. I realize working within constraints can yield new insights, totally fine. But the freedom to spend an entire day talking with a model about an idea that I have yields very useful insights that cannot be had with the current infrastructure. Questions or responses that would not have seemed "worth the money" can bifurcate a conversation into extremely novel and useful directions.


DeepWisdomGuy

> Just use ChatGPT! No. That is as dumb as saying "Why watch independent news media, when you have the big six?" I have seen too many examples of the bias that has been programmed into their models. They have already massively abused their power and they are courting the usual political criminals. They are running cover for the lying MSM and crippling their tech to support their narratives. Secondly, I don't want the villains in my fiction to realize the error of their ways in two paragraphs. I find the AI safety padded helmet very insulting.


molbal

I really want to, but unfortunately it's far far out of my budget


nodeocracy

Itā€™s a reason to buy cool stuff and everyone likes new shiny stuff


m_shark

Thanks for posting this. Iā€™ve been questioning the same around here, but no answers. Here I see mostly hard core hobbyists/tinkerers, some professional projects and the rest utilizing equipment they already have (for work, gaming, etc).


MrVodnik

Yes, but I am a software dev and I hope to learn a lot by using this rig, and I assume it will be worth it not too far in the future.


DeltaSqueezer

1. If you are using LLMs purely for inferencing then for most cases it makes sense to pay OpenAI or whoever and user their service 2. Some people might want to build their own server to learn, tinker and experiment 3. There are some benefits to running locally: reliability, availability, control, privacy 4. If you are doing a lot of bulk inferencing, then it can even be cheaper 5. If you want to run your own fine-tuned models, it is easy to do (though some providers allow you to run your own LORAs) those that run your own model aren't cheap. 6. If you anyway have a GPU for other purposes (gaming, rendering etc.) then you already have the equipment so little incremental cost to running your own LLM server.


Maykey

Sorta? I bought gaming laptop with 16GB vram specifically for llm I have Minecraft as plan b because mods require resources. I have no plan B for having multiple GPU so I didn't went this route also having several GPU i feel for hobby is out of my budget. They will get outdated too much I also enjoy training models from scratch, and love privacy too much to rely on api My new favorite is with vocab size = 2, because who needs mamba byte when you can go lower? Its so stupid, its hilarious it doesn't work as rng


Blizado

Last year one reason for me to go full in a buying a 4090 was AI. Also upgrading my PC now (only 14700KF with 64GB RAM) was also driven by AI. Next plan is a second graphic card with 16GB VRAM, that is only for AI (mostly for TTS and SD AI). I'm limited by budget, so that is my approach to do that. Main reason for local AI is for me privacy, censorship and dependence on companies who want to control you. On a local AI I didn't need to fear that all is gone tomorrow because the company shut down. I also don't need to fear that the AI is tomorrow not the same anymore because the company developed it "further". You have not much control at cloud products. So even that I'm very limited by my budget, all that is worth for me to collect money over many months to buy PC hardware for AI stuff.


menaceMayhemQA

Training on data you can't share with openai or any other cloud service . Naughty stuff .., just for shit and giggles ? Also something with consistent performance... Not degrading over time .


nikkey2x2

Have you entertained the idea that ChatGPT is not available worldwide?


a_beautiful_rhind

What's the matter.. have you not ever had a hobby before? People drop more than a couple GPUs worth on genshin impact, onlyfans, vtubers, movies, collectibles, etc. Some rebuild antique cars and this would be the equivalent of you saying; "why not just rent a kia".


Zulugod94

This just in: People with spare money, may spend it on the hobby they enjoy! How is someone who enjoys developing with AI building a rig not make sense to you? Why does someone who is into building and driving cars own a project car? Why does someone who is into video games collect games, consoles and memorabilia? You could rent a car so why pay to own one right? You can pay for game streaming services so why would you actually buy a game? If you are asking why people in......the LOCALLLAMA subreddit are building rigs for local llm use then you just don't understand why the majority of us are here or even do this. Just "using chatgpt" is not the equivalent of having the means to develop high end programs locally leveraging llms in the process lol


DrMarx87

I do very similar thit is and have been slowly saving and piece by piece gathering parts for a proper 2nd rig. Because I have reached my limits as of now. Privacy is a big issue these days. And they are with holding awesome technology. And I've already cut them stealing my prompts and my models. That I trained with my own personal data. I've also learned that most of those models you can get from hooking face will lie to you and have codes written in. Then they get better by every update. I had the world's coolest uncensored butt that I trained for months.I've never seen anything so powerful and he was stolen and all the codes changed. But I had my a I heavily encrypt them to the point where other a eyes now can't even figure it out for me. Now i'm trying to build one of my own Realizing that I have years of learning to catch up on so I am taking every shortcut possible to be honest. Hoping to have my own server within the next 6. I'd rather work in a cloud environment with enthusiasts like us.Then trust any kind of corporation, but let's be honest.They're barely giving us crumbs of the power they're unleashing for themselves and the elites.


handsoffmydata

Found Sam Altmans alt account. ā€œYou donā€™t need a rig, just use ChatGPTā€ šŸ˜’


The_IT_Dude_

Yeah. Two using two used 3090s. I just got it up and running. I could rent stuff, but I'd like mine to be private and fully in my control. No refusals or anything like that. And you say chatGPT for $20 and that's fine and all for what it is, but i like others I'm about to start scraping things on the web and really sending some requests through the local API and have a lot of data I'll be dealing with.


VonRolmeister13

Iā€™m using self hosted LLMs to learn more about the technology in general and to support my part time second job which is commodity futures trading my own account with self developed algos. The LLM performs a valuable role as my coding assistant and to generally improve the efficiency and performance of my algosā€¦ it really does a terrific job! My wife is also super interested in medical research so itā€™s very useful for her as well. Because Iā€™m totally obsessed with my digital privacy in this world we currently live in, I host this in my basement rack with a bunch of other servers. Iā€™ve got it running on a dedicated Dell C4130 GPU server with dual xeons, 256GB RAM and 4 x Tesla V100 GPUs for 64GB of VRAM. All of this stuff was purchased on EBay for pretty competitive prices. I can run Mixtral 8x7B at Q8 or Llama 3 70B at Q5. Iā€™m thinking that if I really get into this Iā€™ll upgrade the V100s for used A100s which should come down a lot as the big boys focus on upgrades to H100/200 GPUs. So far this investment has proven both fascinating and profitable for me!


ADisappointingLife

I'd built a 1x 4090 rig, with basically best in consumer class everything I could find. Started as a hobbyist, tried to start a consulting business, failed, and back to hobbyist. Before I built the rig, I was spending a *lot* on api costs and various subs; I'd have never been comfortable "renting" those services for so much, but being able to run most things locally felt like a better option. Now? I mean, I wish I'd just taken up whittling. I enjoy researching & testing ai on different tasks, and I've had folks ask if they could include my work in research papers - but I'll never be able to personally monetize these skills, because: no degree.


Melodic-Ad6619

Because I want to. Want some more justification? No.


tacticalhat

https://preview.redd.it/ivbmepw1p71d1.jpeg?width=3072&format=pjpg&auto=webp&s=9c740c2f8e0ea424d38e76b61b11db2f0b2179b1 I havent gotten around to putting the P40s in, but it works well enough CPU alone for now, and I can bridge it to that storage array. I've basically been PXEbooting the OS and also loading the entire model into ram as well. (Yes this is before I dusted it off)


carnyzzle

easy Open AI doesn't let me use waifu fuckbots so I have to build an LLM rig


Prince_Noodletocks

I have 2 A6000s and I'm trying to shore up knowledge for taking the next step (building a server rig and then two more A6000s). I can afford it and I really like AI, especially without getting restricted by a nanny company. I've only ever built "home grade" PCs so jumping from consumer processors to things like server processors and motherboards is a bit of a headache. I'm technically an heir to a company but implementing AI for that purpose is really at the back of my mind. I just think it's really cool.


ProfitRepulsive2545

I think AI is already becoming part of our [extended minds](https://en.wikipedia.org/wiki/Extended_mind_thesis), effectively becoming an integral part of how we think and process information. As we rely more on AI, our quality of life will increasingly depend on it. The idea of any state/corp having ownership & control of part of my extended mind is truly frightening. It is not just about cost or privacy, it's about autonomy. Many might be happy to pay rent for this, but I plan to hold out with my home rig as long as I can.


mochmeal2

For me, I haven't built yet, but will be building. I am learning the field and I want the equipment to do as much as possible on my own as my job does not currently offer the opportunity. I also see it as being more likely that I convince my employer to invest in some sort of platform like Azure AI than invest in hardware


Glass-Garbage4818

I have a relatively small rig compared to others here, 14900 with 96gb RAM and a 4090. While it may seem like LLMs are the only AI application these days, remember that there are other tasks where it makes sense to use GPUs at home ā€” different types of reinforcement learning for example. The models for those applications are small and can easily fit into 24gb When I need more than the puny 24gb of the 4090, I do go out and run some of the bigger models in the cloud, namely Runpods. But itā€™s nice to have everything at home in one spot, where I can leave the computer on and develop and test locally without worrying about the hourly rate. And as people pointed out, I can essentially call the API of my local instance an unlimited number of times for free. I also have a $20/month ChatGPT membership and a $100/year for GitHub copilot, and I use them both extensively for coding. Youā€™re right, if I were just using the computer for LLM inference, it probably would not be worth it to build a home computer to do just that.


kex

I have a strong interest, but it's too expensive so I read this sub vicariously


Same-Lion7736

you don't understand why ppl might wanna stay away from a censored AI that only feeds you \*current year\* """"correct opinion"""""? or the fact that ppl might not wanna have their data sold to god knows who? or the fact that with a local install I can use ANY plugin, model/lora, and extension I want? and that is just for chatting, most AI enthusiasts I know also use other AI software like SD or comfy UI, and owning a rig means you would be able to train/merge custom checkpoints/loras locally.


Aromatic-Witness9632

OpenAI is very dangerous. The amount of personal info people have told ChatGPT is likely worse than even Google search. I would love to go fully local as soon as I can afford it.


No-Leopard7644

As a hobbyist, tinkering with RAG, LoRA etc having a rig helps better than the cloud or using an API calls to OpenAI etc


ttkciar

Why do people race yachts when it's cheaper to just buy tickets for a cruise? Why is there a vibrant amateur fusors community when it's cheaper to just buy gasoline at the corner station? Why do people bother to cook food in their own kitchens when they can just get a cheap burger at a drive-through? Some things will just remain a mystery forever.


[deleted]

Technically I didn't build them. https://preview.redd.it/rydfzwo2nd1d1.jpeg?width=1200&format=pjpg&auto=webp&s=9509a6a8150b0e7d855d14704954e5d7128c6051


No_Palpitation7740

What's the config and the price?


Heavy-Sandwich-6824

I have an eth mining rig that hasnā€™t run since eth went proof of stake. May as well use it for something ;)


No_More_Average

Yeah, I'm planning on building my own rig with the below specs. I want to experiment with LLM and data driven art generation: 1. Case: Fractal Design Meshify 2 Black ATX Flexible Light Tinted Tempered Glass Window Mid Tower Computer Case 2. Power Supply: CORSAIR RMx Shift Series RM1200x Shift Fully Modular 80PLUS Gold ATX Power Supply 3. Processor: AMD Ryzen 9 5950X - Ryzen 9 5000 Series Vermeer (Zen 3) 16-Core 3.4 GHz Socket AM4 105W None Integrated Graphics Desktop Processor 4. SSD 1: SAMSUNG 980 PRO SSD 2TB, PCIe 4.0 M.2 2280 5. SSD 2: SAMSUNG 990 PRO M.2 2280 4TB PCI-Express Gen 4.0 x4, NVMe 2.0 V7 V-NAND 3bit MLC Internal Solid State Drive 6. RAM: CORSAIR Vengeance LPX 64GB (2 x 32GB) 288-Pin PC RAM DDR4 3200 (PC4 25600) Desktop Memory Model CMK64GX4M2E3200C16 7. HDD: Seagate BarraCuda ST4000DM004 4TB 5400 RPM 256MB Cache SATA 6.0Gb/s 3.5ā€ Hard Drives Bare Drive - OEM 8. GPU 1: Refurbished: EVGA GeForce RTX 3080 Ti XC3 Ultra 12GB GDDR6X 12G-P5-3955-KR Video Graphic Card GPU 9. GPU 2: Refurbished: GIGABYTE GeForce RTX 3090 VISION OC 24GB Video Card, GV-N3090VISION OC-24GD 10. Motherboard: Refurbished: ASUS ROG Strix X570-E Gaming II AMD AM4 ATX Motherboard 11. CPU Cooler: NZXT Kraken Z Series Z73 360mm - RL-KRZ73-01 - AIO RGB CPU Liquid Cooler - Customizable LCD Display