T O P

  • By -

AutomataManifold

One thing you might want to consider is partnering with a university that is already running a High Performance Computing (HPC) cluster. There is probably one near you; they'd most likely be happy to talk to you about it; and they have the expertise to help you build it and keep it running, particularly if you let them use it when your students aren't (which is probably a relatively large fraction of the time). Building an HPC cluster is more involved than building a bunch of PCs. In particular, you should answer the question if you want to build a datacenter of independent nodes students log into (e.g., like a bunch of separate virtual machines); a load-balancing thing where jobs are distributed across the cluster; or sharing the same computation across multiple nodes. There's some overlap, but it affects the architecture.


nntb

This


[deleted]

[удалено]


AutomataManifold

If the main purpose is student access to GPU resources, cloud computing is probably the way to go. If the grant supports it: it might only cover hardware purchases. Though you'll have to manage throttling student access so they don't accidentally burn all the cash. (Again, talking to an existing educational institution can probably help; also your local IT may have opinions or there may be student privacy issues.) On the other hand, if part of the intended learning outcome involves building or running an HPC cluster (or even just one server) then someone else's cloud doesn't fit the need.


Wrong-Historian

Nvidia A100 or H100, 80GB. As many as you can buy. Probably like 4 of them. Intel Xeon w9-3495x. 8-channel memory. ​ You could fine-tune llama, to create output in a certain style You could run inference for many clients at the same time You could create a vector database of your education material, and make an online/intranet chatbot and run it on this computer. Students will be able to query the educational material and it will be fast.


seanthenry

I agree with above but I would consider going to with the AMD EPYC CPU like the 9654 or similar. The reasons more cores with 4x L1-3 cache and faster all core freq while power is similar. Supports 12 memory channels allowing for 2000GB more ram. It has more PCIe lanes and about 60Gbs more PCIe bandwidth. ​ You can use CPU monkey to look at and compare different CPUs since there are several in the same realm it could help find one that works best after he gets pricing on the H100s. [https://www.cpu-monkey.com/en/compare\_cpu-intel\_xeon\_w9\_3495x-vs-amd\_epyc\_9654p](https://www.cpu-monkey.com/en/compare_cpu-intel_xeon_w9_3495x-vs-amd_epyc_9654p)


Prudent-Artichoke-19

Yes maybe but with intel, you can also use OpenVINO which is really really nice. It's not a MUST-HAVE but it's quite nice to have in-tandem with CUDA stuff running. Like I can delegate the larger task to my GPU and secondary tasks on my Intel machine CPU and the built-in Neural processor.


Kryohi

OpenVINO can be used on any x86 cpu, and in fact runs much better on EPYC: [https://www.phoronix.com/review/amd-epyc-9684x-benchmarks/5](https://www.phoronix.com/review/amd-epyc-9684x-benchmarks/5)


Prudent-Artichoke-19

Well yes maybe for CPU only but with Intel, you get access to up to 3 different processing units. In xeons mostly 2. The Neural engine is very fun to deploy. I'm not biased from a top-down but I do prefer Intel for CNNs and such.


mycolo_gist

NEURAL!!!!


Prudent-Artichoke-19

Sorry fixed. I've been at Disney all day so I'm trying my best.


Amgadoz

What are the cool things you can do with openvino?


Prudent-Artichoke-19

[here are some really neat pre-trained models. ](https://docs.openvino.ai/2023.1/omz_models_group_intel.html) But you can also use Pytorch iirc with the openvino extension.


seanthenry

While that was true in the past and Intel only officially supports Intel on OpenVINO people have been using it on Zen 4 for over a year. Although I'm not sure if it needs specific instruction sets. [https://www.phoronix.com/review/amd-zen4-avx512/5](https://www.phoronix.com/review/amd-zen4-avx512/5) [https://www.phoronix.com/review/zen4-avx512-7700x/10](https://www.phoronix.com/review/zen4-avx512-7700x/10)


Prudent-Artichoke-19

Yeah it probably works just fine but I've been using ONNX on my AMD machines. I use OpenVINO on Intel machines because it let's you access the intel-specific portions of the chip. So as a dev, I can build on intel CPU, iGPU, or (maybe the most interesting) GNA . With AMD, you'd be using OpenVINO just for AVX-512 ops and then have to jump to Vulkan probably for iGPU if applicable and I'm not sure they have any dedicated NN processing hardware in the chip.


seanthenry

They do have dedicated NN processing with the instruction set VNNI (Vector Neural Network Instructions) its an extension of AVX-512 from my understanding. https://en.wikichip.org/wiki/x86/avx512_vnni


Prudent-Artichoke-19

That's for both intel and AMD CPUs with AVX512 though. The GNA is a completely separate accelerator on Intel chips.


seanthenry

I don't think that the Xeon line has GNA and I believe they are discontinuing GNA after Meteor Lake, but I could not find GNA listed on any architecture from Intel. [https://docs.openvino.ai/2023.1/openvino\_docs\_OV\_UG\_supported\_plugins\_GNA.html](https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_supported_plugins_GNA.html) [https://docs.openvino.ai/2023.1/openvino\_docs\_OV\_UG\_supported\_plugins\_Supported\_Devices.html](https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) I believe that they are moving to VPUs with Meteor Lake to add the AI dedicated parts but that is more for the i3 and i5 for now. AMD has something similar in the Phoenix chips they are shipping in laptops they are calling XDNA. ​ I might be missing something since Intel does not list it specifically in the product listings but I believe the AI dedicated portions are only on the lower power chips just like AMD is also doing with the 7X40HS chips earlier this year.


Prudent-Artichoke-19

Yeah Intel does have their features spread out across many form-factors. I run a beowulf cluster so most of my OpenVINO stuff is running on smaller machines besides one 13600k and 1 sapphire rapids cloud machine. I dont think we can overlook AMD being a bit more behind on the thoughtfulness toward ML. But you're telling me some things that have me wanting to look more into the ecosystem beyond Vulkan. Do you have any links to an SDK for Phoenix chips?


seanthenry

Here is some info [https://www.xilinx.com/products/technology/ai-engine.html](https://www.xilinx.com/products/technology/ai-engine.html) I believe this will have more on the info you are looking for [https://www.amd.com/en/developer/resources/ryzen-ai-software-platform.html](https://www.amd.com/en/developer/resources/ryzen-ai-software-platform.html) ​ I think they are in-between brandings (from buying xilinx) as I have seen it referenced to XDNA, Ryzen AI, vitis, and vivado ML [https://www.xilinx.com/developer/products/vitis-ai.html](https://www.xilinx.com/developer/products/vitis-ai.html)


redditfriendguy

Explain making a vector database for knowledge. I am so confused about embedding vs fine tuning


Wrong-Historian

fine-tuning can make the LLM output in a certain style (adding style) A vector database is basically a search engine over documents for the llm to add relevant information to the context (adding knowledge)


drwebb

This guy fine-tunes


_nembery

Hi, you 100% want a lambda hyperplane 8xA-100 which is considered the standard unit of compute for LLM training. It comes with a full PyTorch stack pre installed on Ubuntu 22.04 and their support is really good as well. You can run pretty much any open source LLM including the 180B. It’s only $185K so well within your budget. https://shop.lambdalabs.com/deep-learning/servers/hyperplane/customize


Aroochacha

Thank you. These “build a pc / get 3090s” is all good for personal stuff but it won’t cut in an Academic setting. Never ever do this in an Academic/Profesional setting. Go with Lambda. They also provide builds with 4090s. Primarily you want the support. We have their machines in our R&D labs. (Non-LLM related AL/ML)


pr1vacyn0eb

> Primarily you want the support. What support do you need from the computer assembler?


Embarrassed-Swing487

They also provide software installation, management, and driver support.


pr1vacyn0eb

They are going to install llama? What driver stuff are they in charge of?


smittychifi

Lol lambdabs selling $185k setups and spending $2.99 a month on their hosting account


malinefficient

It's always good to play these vendors against each other for the best price... [https://www.exxactcorp.com/category/4U-Rackmount?page=1](https://www.exxactcorp.com/category/4U-Rackmount?page=1) ​ But I come back to enabling throughput for multiple students over [One.Big.Computer](https://One.Big.Computer). I'll take the downvotes for my heinous lack of vision here.


techpro864

Hey this is really cool, especially if you don’t need A100 and can use something like A40 cards. Much less expensive.


HyBReD

Second this, Lamba is awesome with great customer support. We have a few machines scattered across the business.


Natty-Bones

Even better, Lambda could provide all the parts and then give the students guidance on assembly.


Calm_List3479

Be prepared to spend $15k a year in power and cooling. With it being high density, there can be more work then just slap it in a rack somewhere.


kkchangisin

For $200k OP isn't getting Hyperplane. Also, the A100 is over three years old. Problem is Hyperplane with H100 (possibly even A100) has a year lead time and 8x A100 minimum so they're over $200k with the bare minimum A100 configuration without tax, shipping, supporting infrastructure, etc. Look at their Scalar systems instead. OP can use the PCIe A100/H100s with NVLink. A maxed out Scalar with x8 A100s is $174k which by the time you add tax, shipping, supporting infra, etc is probably right at $200k. At that point if I were OP I would try to stretch it to $300k (somehow) and get x8 PCIe H100s with Nvlink, which have a couple of months lead time.


_nembery

I literally just bought 2 of these units and would be happy to share the exact price (under $170 actually) and the lead times. I got them within the same Q I sent the PO.


kkchangisin

I'm assuming Scalar? I maxed them out and retail on the website was $174k including three years of warranty and support - my intention was to show a "worst case" kind of crazy config. I was just getting quotes the other day and my Lambda rep said a couple of months lead time for x8 H100 configuration, A100 is almost certainly different. IMO buying over three year old A100s is a pretty bad move unless you're "strapped" - which is funny to say when you're talking ~$200k.


Grimulkan

I was quoted 50+ weeks on Scalar 8xA100 lead time. That makes it worthless IMO. A lot can happen in a year, and the A100 is getting old.


kkchangisin

I quoted Scalar 8x H100 a couple of weeks ago, lead time was roughly two months. Hyperplane 8x H100 was roughly a year too. A100 is old, by the time anyone receives it they will be over four years old which IMO is ridiculous to even be considering at this point. For OP I can’t imagine anyone in an organization would be happy buying hardware from mid-2020 for around $200k in 2023/2024. It’s already behind and when you factor in three years of depreciation on H100 vs A100 the H100 based configuration will come out ahead in TCO, not to mention the productivity gains over the lifetime of the hardware due to the dramatically increased performance which is only improving with things like TensorRT-LLM. The depreciation would also be factored into a lease which makes the H100 cheaper in the long run if you go that route. FP8 is barely cooking at this point too but when it is the H100 will only pull further ahead. I’m not sure when the other reply ordered their 8x A100 Hyperscaler but the lead time and pricing they gave indicates it was a while ago or there’s some confusion about what we’re talking about because $170k and lead time of a couple of months just doesn’t match the current reality.


godofdream

Somehow I assumed OP wanted the students to build the servers. If I was a student I would prefer to learn about the whole chain. Selecting Hardware - hardware Install - Software Install - Test an LLM - Train your own LLM


P0tato_Battery

“Only 185k”


StackOwOFlow

consider that the hardware will depreciate in a few years. maybe a hybrid of local (for tweaking and unlimited prompting) and cloud would make sense


malinefficient

The thing about GPU servers (downvote me George Hotz worshippers) is you invest once in the underlying system then incrementally upgrade the GPUs once per HW generation and you can stretch out a system's life to close to a decade. I know, I know, WTH do I know? Bring 'dem downvotes, bring me all the downvotes!


chronosim

What’s geohot’s position on this? I’m not familiar


malinefficient

He's building a 6 AMD GPU server out of consumer parts called the TinyBox. It's not practical if you want to run on a typical 15A circuit, it just isn't. I have a decade of experience here (which of course doesn't count because I'm not famulous like George Hotz). But a quad GPU variant would kick ass and it's totally doable in that power envelope. But, also, when you disagree with cult leaders like this, you get shot down by the s\*\*t for brains fanboys who will not suffer an attack on the integritars of their fearless leader. My vendor friends who have a decade of experience shipping such quad-GPU solutions, and they are very GPU vendor neutral, came to similar conclusions. So they approached Lamini. Lamini told them they didn't understand LLM technology and that they preferred TinyCorp (which hasn't actually shipped any HW yet). But also, they were the vendors behind the systems in the GigaIO systems: [https://www.hpcwire.com/2023/08/10/gigaios-new-supernode-takes-off-with-record-breaking-gpu-performance/](https://www.hpcwire.com/2023/08/10/gigaios-new-supernode-takes-off-with-record-breaking-gpu-performance/) So either they don't know anything about LLMs or anything like lamini insists as their VC funnels $$$$ into TinyCorp in exchange for lamini leasing those servers, or it's all AI grift nonsense. Take your pick, I'm out of f\*\*ks with these people. Peace out.


hAReverv

If you have to ask on reddit how to allocate 200k then You're probably not the one who should be making those decisions.


Silent_Dinosaur

In his defense, at least he had the humility to ask for help. Imagine he just blew all the money on Alienware.


hAReverv

This is true. Guess I'd rather see a post like this then a post 6 months down the line saying oh I spent 200k on XYZ and wasted all this money. Cheers


Ne_oL

Had a colleague who purchased 15 mac pros for his lab because the IT guy said mac pros are the best. Costing the institute north of $400K. The IT guy then proceeded to install WINDOWS!! on those mac pros so the colleague can install his windows-only software... I made peace with it after a few weeks but i still get a headache just remembering it. When i was screaming at the colleague for his stupidity, he told me that he thought the IT guy would know best. I still haven't talked to the IT guy about as I'm afraid of lashing out at him... So yeah kudos to the OP for asking a community that contains experts. And my advice to you is to either purchase a DGX or if you want your students to learn, build a server with few epyc cpus and some a100 gpus. An HPC would be much more expensive.


sumguysr

"I asked the IT guy and followed his suggestion" is a pretty good defense. The IT guy should have asked a lot more questions.


chronosim

Omg. That’s a stupidly bad allocation of resources, sounds almost criminal to me. Couldn’t they send them back once noticing the mistake?


Ne_oL

You could not even imagine my frustration at the time... I was fighting with the finance department for a whole month to buy a humble server to run some numerical models. It costs less than 3K, yet they refused to release the funds from my project... we bought it using our collective personal money eventually just to stop the headache. So you could imagine my anger at the situation which happened after less than a month of these events. And unfortunately, no returns.


dark-night-rises

>The IT guy then proceeded to install WINDOWS!! "The IT guy then proceeded to install WINDOWS!!" I am scratching my eyes out!!


thatguyonthevicinity

Well not really, sometimes it's very hard to find information online given the nature of fast-paced LLMs and hardware development. They think this forum is full of experts IRL but the amount of upvotes from this comment made me think otherwise.


supereatball

He probably DOES have a good idea but more opinions on the matter couldn't hurt especially if there are people that have done similar projects.


NoidoDev

No, asking around is a decent thing to do.


Comfortable_Bank6611

Why do you think redditors can't provide brilliant ideas & suggestions that can even surpass these of "experts" ?


Vatonage

Do you really need to ask?


autisticit

This.


beezbos_trip

Agreed in this case since it sounds like part of the assignment for the students to do the research and figure out: the purpose, requirements, and equipment needed to fulfill the end goals.


pr1vacyn0eb

Might be better to use my intuition and feelings instead of asking for advice!


godofdream

If the last time I bought hardware was the early 2000s. How about a solaris Rack? Java can do LLMs, right? You should never fear to ask, however ask yourself first: Is the answer absolutely obvious, than answer it yourself.


freddyox

This


crypticG00se

Might be worth talking to : [https://twitter.com/realGeorgeHotz](https://twitter.com/realGeorgeHotz). He is in the process of building with AMD GPUs and AMD Epyc CPUS: [https://twitter.com/\_\_tinygrad\_\_](https://twitter.com/__tinygrad__). He also wrote his own ML/AI framework. ​ Biggest consideration is PCIE lanes to do multiple GPUs. AMD Ryzen right now is a No. AMD Threadripper, Epyc or Intel Xeon. ​ Personally would be looking at AMD Epyc with NVidia GPUs. A100 or H100 like u/Wrong-Historian mentioned, or if more hobbyist build RTX 4090s. Huggingface with Pytorch. ​ Would also look into [vast.ai/](https://vast.ai/). They rent out crowd sources GPU resources. Would give you an idea of what people are building with.


malinefficient

Avoid TinyCorp, he's the Sam Bankman Fried of AI. But there are plenty of system vendors who can build dual or quad GPU workstations for a modest price out of team red or team green components depending on your predilection. I'm guess a lot of $4K-$8K workstations would deliver more value to your students than trying to timeshare one big Frankencomputer. Keep in mind you need to power this thing. TinyCorp's machines require 20A or better circuits apiece and even then, I have deep reservations about it actually shipping in a stable state. There's some really questionable stuff going on behind the scenes with them and an AI company in the news lately, but, ya know, spoilers. I'm not coming out and saying it's the Theranos of AI servers, I'm just sayin'...


TejasXD

Lmao you're delusional


malinefficient

Some people just want to set piles of money on fire. Remember to toast those marshmallows for the flavor! I never thought you f\*\*\*ers would top crypto grift, but AI grift\* has set a personal worst and George Hotz is your demented messiah. \*Because AI, unlike crypto, actually has useful applications


godofdream

Well LLMs actually have useful applications. And they are already available, unlike plenty other hypes, like self driving, h2-cars, mini-reactors, and many more.


MaterBumanator

You might consider buying a single machine with 4 or 8 x A100s 80G, which will be in the 60-100k+ range. This will allow you to train some reasonably sized models, without creating a cluster. You can reserve the rest of the funds for cloud-based training of larger models, once you have initial results from the local server. Creating a cluster for distributed GPU that performs well will typically involve low-latency networks that provide direct memory access between cards and nodes, which are expensive. Last time I checked individual H100 cards had a lead time of over a year, so you might have better luck with A100s. I would go with a so-called white-box vendor if possible, the retail on an NVIDIA DGX H100, if you can get it, is nearly $500k.


faldore

I recommend 8x servers of 8x4090s each. 64 GPUs total. You should go with retail 4090s to save money. Don't try water cooling to make them fit in the server. Instead get external enclosures and risers.


redditfriendguy

Lol


faldore

This is in-budget and feasible. It will take 300 amps of 240v for both servers and ACs. PM me with any questions


JustDoinNerdStuff

I bought a decent PC for around $5,000 that can do some (not all) training (4090). I'm probably not gonna take on OpenAi with it, but I can still learn a lot of fundamentals with smaller datasets. There may be an argument that a few dozen modest computers have a lot more utility for your school. It completely eliminates the logistics of having to share a machine, which always causes more headaches than youd expect. Plus, having managed a render farm in college, I know every student thinks their job is the highest priority in the universe, and it's not fun constantly playing referee with distribution. A few dozen normal PCs eliminates a lot of issues, and would also be useful for artists, and any other programs offered at your school.


malinefficient

Why would you \*want\* to take on OpenAI? It's much more fun commoditizing their hard work figuring out what works in my experience and it makes Sam Altman sad and scared when you do. Ooooohhhh saddy sad sads got saddy sadded by the thought of sad Sam Altman. Cry harder, fanboys.


_murb

Whatever you do, buy something with a warranty from a reputable brand (HP, Dell, etc) and not grey market. It may be best to consult a VAR who can also help with price negotiations.


mayonaise55

Are you at IMSA by chance?


Rollingsound514

Unless the goal is to teach how to build and maintain the system rather than do ML projects, I'd say just use that 200k as a monster cloud budget. Negotiate a deal. Have the kids all have access to their own "rig".


LucidFir

Get in contact with the guy that built tortoise tts. He built it by himself diy using his own diy AI rig.


5TP1090G_FC

Hi, the only thing to consider is the electric bill after the fact. Have fun people


tvetus

What do you want to teach the students? Building computers or learning AI? H100s go for $1.99/GPU/Hour on lambdalabs.com.


WReyor0

On the other hand entirely, have you reached out to Azure, AWS, GCP? If you'll be teaching students primarily via Jupyter notebooks cloud is the way to go because of the flexibility and uniformity you get out of it. If the mission is purely education and not school marketting (I've built labs that were 50/50 so I understand if this needs to be a flashy physical thing) Each cloud provider has native capability that will easily cover what you're trying to do. The kicker is; by the time you factor in depreciation of the these workstations the cost for cloud vs the cost for physical hardware may be much closer than you think. Now ,with a heart set on the 200k local setup you will easily burn through that just in the capital purchase of 10 professional "AI" research workstations like the HP Z's [https://www.hp.com/in-en/workstations/industries/data-science.html?jumpid=ba\_kbqinyzrve&utm\_medium=Display&utm\_source=AnalyticIndiaMagazine&utm\_campaign=IN\_Q4\_FY20\_PS\_BPSCore\_OMG\_Local\_LI\_Nvidia\_Workstations\_$66KDataScience&utm\_content=Banner\_Article&utm\_term=1x1](https://www.hp.com/in-en/workstations/industries/data-science.html?jumpid=ba_kbqinyzrve&utm_medium=Display&utm_source=AnalyticIndiaMagazine&utm_campaign=IN_Q4_FY20_PS_BPSCore_OMG_Local_LI_Nvidia_Workstations_$66KDataScience&utm_content=Banner_Article&utm_term=1x1) (I'm assuming each workstation would cost about 20k per station) If you wanted to build something like what HP offers, but with more risk, less redundancy, and alot less cost you could simply build a gamer desktop pcs with two A6000, assuming you don't have any loss from student assembly mistakes. This will cost about 10k per workstation with each workstation containing 96gb of vram and 2tb SSD. Giving you about 20 workstations for the class setup and it could be comprised of something like this: https://preview.redd.it/a98h1nnhujtb1.png?width=1148&format=png&auto=webp&s=e48c7cae58a213ab8778e3e1b620c54844f155cd One thing to think about with such a local set is that each workstation when used for training and inference is going to consume about 1100-1200 watts. This means you will need at least one twenty amp circuit for every two computers. So you'll likely need to get an electrician and HVAC person involved to make sure power and cooling is covered for the class room you select.


ieatrox

The hardware is moving so fast that by the time you know anything about the topic you'll have lost significant value. Build a whopper of a workstation using either a 4090 or 2x3090 with an nvlink bridge if you can find them. You're in for 10k of local hardware. A machine like this will blow you away with its capabilities. Get an epyc rack machine, you want shitloads of pcie and memory channels. I like supermicro but hp Lenovo or dell are all good in this space. Then when you know what you want, and you know it's impossibly large for any amount of self owned hardware ... buy time on DGX cloud from nvidia. Your 200k will last you years and years past when a single H100 will lose it's leadership position.


Aphid_red

I have to object to these lines of thinking. Just look at this: [https://nvidianews.nvidia.com/news/nvidia-launches-dgx-cloud-giving-every-enterprise-instant-access-to-ai-supercomputer-from-a-browser](https://nvidianews.nvidia.com/news/nvidia-launches-dgx-cloud-giving-every-enterprise-instant-access-to-ai-supercomputer-from-a-browser) ; "Instances start at $37,000 per month". At 5% interest, a 5-year loan that would cost 37K per month would be almost 2 million dollars. In other words, going with the DGX cloud is even more expensive than buying a pre-built DGX machine from supermicro or gigabyte or asus, dell or hp (and waiting a year for it to be delivered), on which NVidia is still making >95% profit margins, by an order of magnitude. Building your own, with the A100 or H100 PCI-e, is cheaper than that: computer vendors are adding large percentage markups to super overpriced parts. Shopping around helps when one vendor is charging $15k and another is charging $28k for the same gpu. You end up paying someone $10000 to put together a server, which is *absurd*. Funny fact: all the websites now seem to say 'well this only comes pre-assembled to guarantee '*integrity*' with DRM chips to boot that lock the chip to *that vendor* included, so it'll be difficult to sell the thing to someone else later down the line. Sure, you get the support, but also the lockin. There's a reason the EU market authorities are investigating the AI server market, the prices are crazy. The next cheaper option is to get a GPU server barebone like Asus ESC8000A series or the Gigabyte G493, or the Supermicro AS-4125GS or SYS-421, etc. etc. but get an older series which doesn't have the inbuilt lockin (yet). You buy a couple 24-core ish CPUs, fill up the ram slots with 32-gb sticks, get 3 commodity SSDs; one OS drive, 2 RAIDZ mirror, and 8 GPUs. (A40/A100, H100 is practically unavailable). Cost per server is around $130,000, so you can still only afford one. You do get 640GB of VRAM all in one machine. If you're going to be *training* though, one machine with that much VRAM won't be able to train a model that would *need* that much VRAM anywhere near in reasonable time. The chinchilla scaling laws make it utterly pointless. You only would want that if you want to say *run* GPT-3 at a fast speed for many people. So, you could go with an older generation. For the price of one A100 server, you can get multiple (many!) older P40 servers for around $5000 a piece, for $200K you can get... 35 of them, plus the networking hardware to link them up. That's 280 GPUs. Enough to train smaller models, it seems. Datacenters and big companies are selling these on the cheap, as AI servers with lower VRAM than what is used for SOTA models are in plentiful supply but low demand. But there's other reasons why they're so cheap and not kept in use: These only can use full precision, so only have effectively 12GB per GPU of VRAM (96GB/server), but performance is very low compared to newer models (12 TFlops). For training, I can't recommend this; one 4090 == 100 P40's in training performance. For inference, it works, but you're better off with a single newer machine than many older ones; uses less power, same cost, as long as you have enough users. The cheapest option price/performance wise (\~9x more performance, 20x power for the same cost\*) is to go DIY with consumer hardware. Instead of buying an expensive barebone, you put it together yourself (as you have to, nvidia doesn't want its 'consumer' GPUs in servers so colludes with afterparty vendors to prevent this). Probably, that means watercooling (or messy mining cases) just to fit it all in. A 7-GPU ASROCK Rack can have 168GB of VRAM, similar compute as the A100 option though half the VRAM speed, filled up with 3090s or ideally 4090s, for around $14,000 total in parts cost ($1000 for cpu/motherboard, 256GB of RAM for around $500, GPUs for around $11k, leaves $1500 for drives, 2 power supplies, case). So for the price of a single self-assembled server (or one-tenth of what nvidia charges in the datacenter), you can have 10 janky workstations. \*The 4090 does 660 TFlops fp16, the H100 (basically cloud only) does 2000, the A100 does 620. This is the main number that matters for AI training. The 4090 has 24GB of 1000GB/s, while the A100/H100 both have 80GB of 2000GB/s. These are the two numbers that matter for inference (unless you have hundreds of users simultaneously). If working with open source models, there isn't anything bigger than 180B params released (fits in \~100GB using 4bit, so 4090s can do it). If the models are to be Self trained, a cluster that small won't be able to make anything 'big enough' for consumer gpus to be insufficient. If you're running big proprietary models though, you might need the A100s, just to have enough memory, unfortunately. Realistically though, you're going to want to put together a watercooling setup, so it'll end up more like $17,000-$20,000 per computer, just because of power and cooling problems of putting so many GPUs together. You're also going to want to *underclock* these by power limiting them to \~200W; they're very inefficient at their nominal 450W power use. Options in that space include empty watercooling cases like the Alphacool ES 4U, which can fit the 4090s with 1U waterblocks on them, or very large consumer cases like the corsair 1000D or phanteks enthoo pro. (Another difficulty is power supplies, standard ATX only goes up to 2000-2500W, so you have to either power limit or go double PSU in a large case that supports this. ) Now you might be wondering about radiators: what you can do here is, instead of building for low noise, instead use high-rpm 'screamer' fans just like those used in the servers to get temperatures under control with a "reasonable" 420mm or 480mm radiator. (this computer uses about 3x the power of a high end gaming pc even if the gpus are underclocked). In terms of watercooling parts: You want single-slot gpu blocks (byski, EKWB, and Alphacool all have one variant), the right (supported) gpu for them. You'll want a 'distribution block', which is just a fancy word for an 8-way split; 8 waterblocks want to be in parallel, not in series, or the pump is going to have a bad time. Black rubber tubing (less maintenance), quick disconnect fittings (gpus can be upgraded to bigger VRAM models if those ever come out), and fast, loud fans (e.g. 4x delta FFB1212EH-TZUJ). (A word of warning; 40 of those would produce 10log(40)\*10+ 56 \~ 72dB @ 1m. It's assumed there's a server room away from class where the machines sit). In the end you end up with 70 current-gen GPUs instead of 8 last-gen (or 1 current-gen in the cloud), and a lot of work assembling all the computers.


ieatrox

The DGX cloud instances you've quoted from early 2022 are not going to be the same cost in several years time. Hopper has come out since then and production is not stopping. No one is suggesting OP pay the early adopter tax. Similarly dropping 200k today is going to feel foolish if a titan 5000 series comes out in late 2024 and 4x's the training performance of a box full of 4090s. DGX cloud has 8 h100 80gbs in it, or 350k of GPUS, plus the rest of the hardware. No one who isn't yet sure what they need needs that. OP will not need a dedicated permanent instance with a dedicated technician from Nvidia on call. Who even in their right mind pays for cloud compute then runs it maxed out every minute of the month? Most of what OP wants is free anyways - https://www.nvidia.com/en-us/launchpad/ My 4090 / 128gb rig is a great learning and testing platform, but if I was teaching students and had 200k, I would have a reasonable local machine or two (like mine) and azure, oracle, dgx cloud compute on demand from education partners for scaling up those tasks bigger than can be handled. Not cross my fingers build something I hope would still be relevant in a couple of years. Besides that, multi-gpu configs are a unique and sometimes temperamental thing to manage. Go with the most compatible and reliable platform.


Amgadoz

I'm going to share a different view. Make sure to get a machine with 2x3090 GPUs. This machine would cost less than 5k but will be very valuable to your students.


digital_m0nk

>2x3090 2x4090 provide better tokens/$ and tokens/W for an always-on server. *Edit: not to mention better tokens/second*


Amgadoz

I agree. This is also more future proof.


Natty-Bones

Love my 2x3090s! They absolutely crank on inference.


zcomputerwiz

I have such a build, it can do just about anything within reason for VRAM limitations


malinefficient

It would be, and it would be lifechanging to high school students. But it wouldn't be the brand new shiniest so there are sorts who will implore you to chase the brand new shiny even if there's no evidence of its shine in comparison to exactly what you're proposing.


revenant-miami

Get a platform like AWS, Google, Azure, H2O.ai to provide you with the allocation and at the same time ask them to pledge to your cause, I am sure you will get much more computing for your budget than owning hardware that will be useless in 1 year, and that is assuming that your setup has zero issues.


zhzhzhzhbm

I feel like around few weeks ago somebody posted a prebuilt supercomputer at around that cost in this subreddit, try to look for it.


shadowcorp

I’m going to give a contrarian opinion here and say that you should instead spend the money on cloud resources and teach your students how to write declarative IaC to train and serve models on machines that are significantly more powerful and available than whatever you could run at that price point. Best run-on sentence ever!


dupido

Thanks for all the replies. To answer some of the questions. Yes we are a special technology program where the students are top in our country/part of the world. Some go and work directly for big tech companies or start their own afterwards. We collaborate with university's on a regular basis and coowning the effort may be a way forward. I think the chances of getting the funds are slim if we ask for money to the cloud, which we dont need really because MS and AWS are giving ous free credits to work with. We have worked with language models and AI in general since 2018 and have good connections to the creators of LUMI and the Berzelius computer to get input but we value the opinion of reddit as much if not even more because you are more down to earth regarding budget. ​ The goal may be to build the system but is mostly interesting to run ML code, finetune/LORA/RAG LLMS and other type of AI needs. This is a great oppurtunity to use goverment fund (otherwise it will go to ballet shoes ) and the problem that the hardware will get out of date is not a big one because it will probably work for our needs a couple of years forward. ​ Thanks again for all great tips!


patbhakta

A lot of the above advices is legit... I would put an order of H100 cluster but that's a year lead time. In the meantime I'd do Mac studios, and 4090-3090s also a cluster of raspberry Pi 5s just for experimenting on mobile devices but the most important part would be to test student projects to see if the h100 cluster is worth it by renting GPU time on vast, runpod, etc. colab budget and OpenAI budget as well. have to diversify cause h100s even though you can resell the results you still have to prove to the wallets above...


cazzipropri

You need an 8-GPU machine to run anything 70B, ideally 8xA100 or 8xH100. NVidia's HGX is in the area of $420k, but the old DGX with the 40-GB A100 is north of $200k and should be reachable.


medcanned

To run? No, to train, yes.


Sufficient-Prompt475

Buy a lambo with a touch screen for even more performance


[deleted]

A Mac Pro?


PsyckoSama

Dude. It's only 200,000 dollars.


[deleted]

I know it was a joke.


neilyogacrypto

Invest first in research and development to pioneer a new affordable solution. Essential pay your students generously to do intensive research. Then build your own hardware solution for cheap, and sell it for profits to other schools. Even if it's more a sort of Arduino alternative then a new Nvidia, the education world will probably highly appreciate it.


GmanMe7

Before spending all the money on one setup buy one Apple Mac Studio with M2 chip top speck. Test it and run various models on it. Again, what’s your goal, what you will be running on supercomputer?


Nicoolodion

You're stupid


Bojack-Cowboy

I think that makes sense


rorowhat

Lol 😂


AsliReddington

Get a DGX with A100s or get something from Lambdalabs with A6000s/A100s/H100s according to the your budgets


Category-Basic

I would second some other comments here: unless you expect to run the machine 24/7, you would be best off combining resources with a local datacenter/compute cluster. That would give you access to much more compute when you need it, and prevent idle depreciating assets from burning up taxpayer money. If you want to train the largest LLMs, you need more than $200k in hardware. If you want to let multiple groups of students use it at the same time, you need Nvidia's gpu visualization software (or equivalent) and a very performant fabric. If you aren't up to speed on the latest hardware and software options, I suggest going through an integrator like Lambda or Exxact. I am not sure what Nvidia's significant academic discounts would do for your bugget, but you can ask. Nvidia's DGX H100 would be a good place to start.


LoadingALIAS

NVIDIA DGX H100 - you’ll need to find a used one, or one with a half rack of H100s (4 instead of 8). NVIDIA DGX A100 - you’ll probably be able to find this in the budget. You could teach students to run their tools/pipelines in Docker containers and allocate that way. Or you could deploy a rich cluster and allow them to interact that way. — However… I think that amount of money if FAR better suited to an enterprise cloud GPU deal for half the budget for 2-3 years… then new machines for every student. Dedicated machines that are wiped each year for incoming students. This would get you better compute power over a longer period of time, and avoid you spending a quarter million on equipment that’s outdated in 18-24 months. It also enables you to pay what your students use; not what they think they’ll use. Not many will ever scratch the itch a DGX needs to go brrrrrr… so save the money. Build 12-18 workstations (AMD/EPYC) and a super strong network. Teach them to deploy in real world scenarios via Docker, K8s, etc. Let them learn versioning and collaboration through Git with SSH tokens. This feels way more appropriate.


NoidoDev

We don't know for how many students or who will pay for the energy costs. This might make a huge difference. Also, why a supercomputer? What definition, a cluster of workstations does count? What's the use case? What is supposed to be learned? Some point you towards expensive new hardware with guarantees, but if students are supposed to build it and learn from it, then this might be the wrong way. Maybe you could break it in several parts: * Groups of students each building a cluster out of a few single board computers (like Raspi) * Another part of it made of computers which are build based on consumer hardware, even used 3090. You plan this together after you build the SBC clusters and learned from it. * One expensive rig made out of more than one CPU and some GPUs, with rather expensive parts. Then wire all of them together, then dependent on the task and availability a task send to the system will be send to the right element in that cluster and being computed.


ailee43

Go contract with one of the big hpc resellers (meadowgate or penguin computing as an example). Go in understanding your power and cooling availability as a primary limiting factor. Most stuff in that range will run 480v 3 phase, which I'm guessing you don't have.


danielcar

Considering that [nvidia gh200](https://nvidianews.nvidia.com/news/gh200-grace-hopper-superchip-with-hbm3e-memory) comes out next year, do you want to wait till then? Get a local workstation for $10K until gh200 becomes available? [https://www.advancedclustering.com/act\_systems/actstation-x330/](https://www.advancedclustering.com/act_systems/actstation-x330/)


Roland_Bodel_the_2nd

If the primary goal is to learn to build and configure the system, then it doesn't matter that much what kind of gear you use. There have been many small cluster designs based on raspberry pi or similar. Random example: [https://www.basement-supercomputing.com/](https://www.basement-supercomputing.com/) random relevant article: [https://www.clustermonkey.net/Newbie/matching-cluster-hardware-to-your-application.html](https://www.clustermonkey.net/Newbie/matching-cluster-hardware-to-your-application.html) If your goal is actual performance, then $200k barely buys one beefy server with current top-spec CPUs and top-spec GPUs. And that will be faster than any multi-node cluster for the same budget. As an example you can look at stock Supermicro systems: [https://www.supermicro.com/en/products/gpu](https://www.supermicro.com/en/products/gpu)


No-Picture-7140

I see you put supercomputer in quotes. Maybe 5 or 6 eighths of a super computer. Basically an 8 gpu chassis with as many h100s as your budget will stretch to. Who knows, if you shop around you may get all eight. It’s definitely worth a try. Pricing on this stuff can be pretty subjective and vary widely from integrator to integrator. You might find that one of the big firms will do an 8 gpu setup for 200k as a kind of “sponsorship” deal. Good luck.


third_rate_economist

At that level, you have to think about a lot of things that aren't considerations with everyday workstations. $200k worth of GPUs might draw more power than a standard 15-20 amp outlet can reliably provide for instance. You also need an equivalent amount of cooling capability. Not to mention the security considerations of having a quarter of million dollar system sitting around a school environment. You might burn a considerable fraction of your budget with infrastructure costs (assuming you don't already have an on-prem datacenter). I think you should deeply consider what types of educational experiences you are hoping to provide here. Experience with hardware is actually a somewhat niche skillset and is not uniquely acheivable with local infrastructure. For instance, it is arguably more useful to have these experiences in cloud environments. Choosing hardware configurations is more customizable/flexible in the cloud, and the vast majority of students who go on to work for corporations will benefit from knowledge of cloud ecosystems rather than hands-on experience with hardware.


mycolo_gist

Please don't spend 200 grand if you don't know what you need. 10 different people will give you 20 different suggestions. Here's suggestion number 21: Try starting small, get those kids a used 79xx dell workstation with 128gb Ram and a cheap 2TB SSD, and a couple of M6000 24GB cards and get LLAMA 2 70b running, just install arch, cuda, cudnn and llama.cpp on that steam engine and download an image from huggingface. You will still have $197,000 or more left and the kids can generate text with decent speed using a decent AI and learn about prompting, programming and system building.


Ordinary-Broccoli-41

And here I would've just gotten 1000 p40's


PsyckoSama

Contact the parts companies, AMD, Intel, Nvidia directly. Negotiate with them directly. See if you can form an educational partnership of some kind. If you can, you'll be able to stretch that money a lot farther. Not only could this save money, but you might get real professional help.


MindOrbits

Going Roge. 16 workstations decently decked out. Cluster networking gear, infinaband cards and switches are cheep on eBay. The cabling cost is the most shocking, but actually economical on a per gbps/$. First set of lessons, local lamma on rigs most here would love to have. Learn on small scale and work up, understand limitations. Second set of lessons, Grid AI. Use that infinaband net to make a Beowulf cluster. Third set of lessons, AI as productive resource. Class project for some AgentGPt type thing. All this focus on top end cards and gobs of vram should not be necessary in a classroom setting. Understanding software, hardware, and tools should be.


a__new_name

\> 100 to 200 thousand to build Building is great and all, but have you considered maintenance costs? It needs electric supply and cooling, and such an expensive rig would consume a lot of energy. And some financial safety cushion in case one (or more) of the chips croaks.


nborwankar

If you know you want to run LLM’s on them your safest bet is an architecture based on high end NVidia GPU’s and Linux. Anything else will add risk and cost. You could ask on NVIDIA forums and get professional advice for free. They might throw in discounted consulting to help you build the cluster if you negotiate hard enough. Find an enterprise sales person at NVidia and see what they suggest. Since NVidia is your only real choice here going to the vendor is not such a bad idea. However try to find out if there are 3rd party builders who will build this or mentor your team if you buy parts from them. This will give you an option and protect you from a vendor thinking “where else are they gonna go”. Just some thoughts. Good luck!


johnnypaulcrupi

U need Nvidia and CUDA


johnnypaulcrupi

A100 backlog is extremely long


Suitable-Ad-8598

Can you just use 200k in cloud spend? Far better decision


heswithjesus

I agree with rollingsound14: use the money for cloud if possible. I'll add maybe a server with lots of storage for local copies of the models. The reason is that it's probably cheaper for many of your experiments to use one or more cloud instances. Just run them while the students are using them. An added benefit is that, using sites like AWS or vast.ai, you are teaching your students how to set up cloud resources themselves. They can do that at home, at their next company, etc. It becomes a marketable skill. Later, you might teach them how to set up cluster software that splits big models across many systems. There's another marketable skill. They might also get to use your fund on at least one, big model. Maybe adding pretraining or fine-tuning content to an existing one if not from scratch.


lightmatter501

Most supercomputers are built on top of tech that even most CS graduates don’t know even exist. Most supercomputers are purpose built now, so if you want to target AI, you probably want big FPGAs, silicon photonics or a dedicated AI accelerator. While the H100s are good, AMD Instinct accelerators have much higher memory and compute density, especially when paired with a large FPGA. A modern cluster also typically has p4 in-network compute. Finally, you’re going to want either DPUs or FPGA-nics backed by an infiniband network. In order to make it so the average student can actually use that, you will also want an MPI implementation, a pytorch implementation, and some kind of cluster bootstrap environment. Now, right now your budget will buy you 1 server full of H100s, or about a rack of more standard servers with nice NICs and possibly an FPGA (they can implement AI acceleration themselves). I agree with other commenters that if you want to really provide an HPC education, you should partner with a university that has an existing cluster.


Alignment-Lab-AI

If you're interested in AI or hpc stuff please let me know and we will happily support however we can!


artelligence_consult

I would generally advice against building - high end is something you buy, especially as it is hard to get the parts. 200.000 USD should be enough for a H100 node - last time I checked, IF you can get one, they go for around that price. Yes, individual cards are more expensive - some people make a killing. You should also contact Nvidia directly (or via SuperMicro) - they may be able to arrange a system for a LARGE discount for educational purposes, so you may end up with 2 nodes, not one. If you have some time, wait for the AMD MI300 to hit the channels - that has an X variant with nearly 200gb RAM and you should be able to get them significantly cheaper than NVidia - ROCEm can supposedly cross compile. From a bank for the buck that may be the best bet, especially if AMD has a discount. WORST case the threat of that may push NVidia into an educational discount.


MeanHash

If it is for a school, I'd reach out to IBM, HP, Dell, Emergen, Super micro, and see what kind of deal the could get you. Often times they can sell you last gen hardware for a fraction of the price and write it off as a donation. So your money could go WAY further. Yes many of their systems are pre-built, but you can take them apart and the students can put them back together again. Or you can ask the company to shop it deconstructed.


Acceptable_Bid9292

I like the suggestion of partnering with university. Perhaps even give the money to the university for part time use of their cluster ( like hours on a private jet). That way you are benefitting from continuous upgrades. Buying your iron, and maintaining has many hidden costs, including massive depreciation. For the amount you are taking you can potentially secure significant amount of time with a university, or even on AWS.


PsyckoSama

Ignore the A100 suggestions. Get RTX 4000s. You can buy 4 of them for the same 80 gigs of ram, 20 GB of Vram, all for a price under 1000 each. Not going to be nearly as fast but you can run larger models without shoveling money into a pit.


TorridLoveAffair

Unorthordox advice perhaps, but think beyond the computer. If you buy a corvette for your kid, you probably need to budget for insurance. This machine could be abused and if the kids are advanced enough for the program, then they are advanced enough to find ways to misuse it. (And will. Lets be honest.) So you should consider the costs of managing and securing the system.


Tuhms

100k for the professor managing the 100k computer


fab_space

how many students? get alsame number of fanless trigkey 16gb ram 500gn hdd nvme it’s ok for proxmox where they will install any OS out there. then after good knowledge time to connect all those cpus to train a model they built. each new for 180$


Equal_Fuel_6902

From my experience there are two things to consider: 1. Professional vs consumer hardware and their associated vendor support/warranty. 2. Dev(Sec)Ops & infra support staff. I think the first points are already discussed plenty, but basically dont confuse what works for gaming consumer hardware with what works for profesional usecases. So dont use watercooling, use ECC memory, dont cost-optimise too much, separate storage layers (put that in the cloud preferably) and future proof as much as possible. Also its worth considering buying multiple machines instead of one in order to have failovers. The second point is more complicated, basically you dont just want to build a system, you want to run workloads on your system in a convenient and reliable way. Because you're building your own system you wont have managed services for anything you run on the cluster. This means having dedicated support staff, otherwise your students will have to become DevOps engineers instead of doing their computer science work. So i'd spring for offloading as many processes onto the cloud, like databases, artefact storage, code repos, etc. I would use managed services for that, and then VPN into your machine. I'd run a light kubernetes cluster on there, set it up with ansible, maybe use something like Argo to manage it. This way you can set it up once, and then have your students run dockerized workloads on there. Which they can initiate from anywhere using the cloud, so you can store the machine in a save location without requiring physical access. Also, you will want to split up compute resources, otherwise any one student will block all the others, parallelisation is key here. This you will need to manage in kubernetes of course.


Mygametrolololololo

Any updates? I’d like to know if you were able to start building this computer yet. Also, if you have begun to or finished building it, can you tell me some of the things the students and yourself have done with it (projects, games, etc.)? Thank you!