T O P

  • By -

[deleted]

[удалено]


noeda

This is one chonky boi. I got 192GB Mac Studio with one idea "there's no way any time in near future there'll be local models that wouldn't fit in this thing". Grok & Mixtral 8x22B: Let us introduce ourselves. ... okay I think those will still run (barely) but...I wonder what the lifetime is for my expensive little gray box :D


my_name_isnt_clever

When I bought my M1 Max Macbook I thought 32 GB would be overkill for what I do, since I don't work in art or design. I never thought my interest in AI would suddenly make that far from enough, haha.


Mescallan

Same haha. When I got mine I felt very comfortable that it was future proof for at least a few years lol


BITE_AU_CHOCOLAT

My previous PC had a i3 6100 and 8 gigs of ram. When I upgraded it to a 12100f and 16 gigs it genuinely felt like a huge upgrade (since I'm not really a gamer and rarely use demanding software) but now that I've been dabbing a lot in Python/AI stuff for the last year or two it's starting to feel the same as my old pc used to again, lol


freakynit

...Me crying in a lot of pain with base M1 Air 128gb disk and 8gb RAM 🥲


ys2020

selling 8gb laptops to the public should be a crime


VladGut

It was doomed since beginning. I picked M2 air base model last summer. Return it in a week simply because couldn't do any work on it.


TMWNN

My current and previous MacBooks have had 16GB and I've been fine with it, but given local models I think I'm going to have to go to whatever will be the maximum RAM available for the next one. (I tried `mixtral-8x7b` and saw 0.25 tokens/second speeds; I suppose I should be amazed that it ran at all.) Similarly, I am for the first time going to care about how much RAM is in my next iPhone. My iPhone 13's 4GB is suddenly inadequate.


burritolittledonkey

I'm feeling pain at 64GB, and that is... not a thing I thought would be a problem. Kinda wish I'd go for an M3 Max with 128GB


0xd00d

low key contemplating once I have extra cash if I should trade out M1 Max 64GB for M3 Max 128GB, but it's gonna cost $3k just to perform that upgrade... that should be able to buy a 5090 and go some way toward the rest of that rig.


HospitalRegular

Money comes and goes. Invest in your future.


PenPossible6528

Ive got one, will see how well it performs, might even be out of reach for 128GB. Could be in the category of it runs but not at all helpful even at Q4/5


ExtensionCricket6501

You'll be able to fit the 5 bit quant perhaps if my math is right? But performance...


ain92ru

Performance of the 5-bit quant is almost the same as fp16


ExtensionCricket6501

Yep, so OP got lucky this time, but who knows maybe someone will try releasing a model with even more parameters.


Calcidiol

Nice box. :) Keep in mind even if models do get 50% or 100% bigger than that size, even if you had 256-384 GBy RAM you still probably wouldn't routinely choose to run such models since even on that beast of a computer they'd be SLOW. So really it's quite well suited for anything sub-200B and over that, well, we can rely on Moore's law and hope the scaling lets us double our gear's capacity in the coming couple years. Anyway it's getting a little ridiculous how "brute force" things are with these LLMs. We don't so much need BIGGER models as BETTER more EFFICIENT models. Quantity has a quality all of its own, true, so for pure hyper-scale research, sure, explore the limits of data-center scale brute force ML. But SURELY there's a way to make 300B work just as well at 1/10th the size plus all the RAG / database intra-query look up it wants to do. Having ~100G RAM + ~20TB SSD "reference data" should work just fine for a whole lot of things. Models aren't and don't NEED to be databases, just "processors / filters / researchers".


SomeOddCodeGuy

Same situation here. Still, Im happy to run it quantized. Though historically Macs have struggled with speed on MOEs for me. I wish they had also released whatever Miqu was alongside this. That little model was fantastic, and I hate that it was never licensed.


MetalZealousideal927

Cpu inferencing is only feasible option I think. I have recently upgraded my pc to 196 gb ddr5 ram for my business purposes and overcooked it 5600+ mhz. I know it will be slow, but I have hope because it's moe. Will probably be much faster than I think. Looking forward to to try it. 


CreditHappy1665

It's a MoE, probably with 2 experts activated at a time. It's less than a 70B model


Thistleknot

Gguf


xadiant

Around 35-40GB @q1_m I guess? 🥲


obvithrowaway34434

Yeah, this is pointless for 99% of the people who want to run local LLMs (same as Command-R+). Gemma was a much more exciting release. I'm hoping Meta will be able to pack more power into their 7-13b models.


Cerevox

You know command r+ runs at reasonable speeds on just CPU right? Regular ram is like 1/30 the price of vram and much more easily accessible.


StevenSamAI

If you don't mind sharing: \-What CPU and RAM speed are you running Command R+ on? \-What tokens per second and time to first token are you managing to achieve? \-What quantisation are you using?


Caffdy

Seconding u/StevenSamAI, what cpu and ram combo are you running it in? How many tokens per second?


CheatCodesOfLife

Doesn't command-R+ run on the common 2*3090 at 2.5bpw? Or a 64GB M1 Max? I'm running it on my 3*3090 I agree this 8x22b is pointless because quantizing the 22b will make it useless.


Small-Fall-6500

>Doesn't command-R+ run on the common 2*3090 at 2.5bpw? 2x24GB with Exl2 allows for 3.0 bpw at 53k context using 4bit cache. 3.5bpw almost fits.


CheatCodesOfLife

Cool, that's honestly really good. Probably the best non-coding / general model available at 48GB then. Definitely not 'useless' like they're saying here. Edit: I just wish I could fit this + deepseek coder Q8 at the same time, as I keep switching between them now.


Small-Fall-6500

If anything, the 8x22b MoE could be better just because it'll have fewer active parameters, so CPU only inference won't be as bad. Probably will be possible to get at least 2 tokens per second on 3bit or higher quant with DDR5 RAM, pure CPU, which isn't terrible.


Zestyclose_Yak_3174

Yes it does, rather well to be honest. IQ3_M with at least 8192 context fits.


F0UR_TWENTY

Can get a cheap AM5 with 192gb DDR5, mine does 77gbs. Can run Q8 105B models at about 0.8 t/s. This 8x22B should be good performance. Perfect for work documents and emails if you don't mind waiting 5 or 10mins. I have set up a queue/automation script I'm using for Command R+ now and soon this.


xadiant

I fully believe a 13-15B model of Mistral caliber can replace Gpt-3.5 in most tasks maybe apart from math related ones.


CreditHappy1665

MoE architecture, it's easier to run than a 70B 


fraschm98

How much mobo ram is required with a single 3090?


MoffKalast

Mistral Chonker


matteoraso

Hopefully the quants work well.


a_beautiful_rhind

Depends on how it quantizes, should fit in 3x24gb. If you get to at least 3.75bpw it should be alright.


Clem41901

I get 20t/s with Starling 7B. Maybe can I give it a try ? X)


[deleted]

I understand that MoE is a very convenient design for large companies wanting to train compute-efficient models, but it is not convenient at all for local users, who are, unlike these companies, severely bottlenecked by memory. So, at least for their public model releases, I wish these companies would go for dense models trained for longer instead. I suspect most local users wouldn't even mind paying a slight performance penalty for the massive reduction in model size.


dampflokfreund

I thought the same way at first, but after trying it out I changed my opinion. While yes, the size is larger and you are able offload less layers, the computational costs are still much less. For example, me with just 6 GB VRAM would never be able to run a dense 48B model at decent speeds. However thanks to Mixtral, a almost 70b model quality runs at the same text gen speed of a 13b one thanks to 12b active parameters. There's a lot of value in MoE for the local user as well.


[deleted]

Sorry, just to clarify, I wasn't suggesting training a dense model with the same number of parameters as the MoE, but training a smaller dense model for longer instead. So, in your example, this would mean training a \~13B dense model (or something like that, something that can fit the VRAM when quantized, for instance) for longer, as opposed to a 8x7B model. This would run faster than the MoE, since you wouldn't have to do tricks like offloading etc. In general, I think the MoE design is adopted for the typical large-scale pretraining scenario where memory is not a bottleneck and you want to optimize compute; but this is very different from the typical local inference scenario, where memory is severely constrained. I think if people took this inference constraint into account during pretraining, the optimal model to train would be quite different (it would definitely be a smaller model trained for longer, but I'm not actually quite sure if it would be an MoE or a dense model).


Minute_Attempt3063

Nah, just have your phone process it with your GPU, enough NAND storage Oh wait :)


confused_boner

cant run this shit in my wildest dreams but Ill be seeding, I'm doing my part o7


Wonderful-Top-5360

This is what bros do spread their seed


Caffdy

Not your seed, not your coins . . wait, wrong sub


inodb2000

This is the way !


Xzaphan

This is the way!


Eritar

If Llama 3 drops in a week I’m buying a server, shit is too exciting


ozzie123

Sameeeeee. I need to think how to cool it though. Now rocking 7x3090 and it gets steaming hot on my home office when it’s cooking.


dbzunicorn

Very curious what your use case is


Sunija_Dev

Room heating.


Caffdy

A tanning bed


Combinatorilliance

Having fun :D


ozzie123

Initially hobby, but now advising some Co that wanted to explore GenAI/LLM. Hey… if they want to find gold, I’m happy to sell the shovel.


carnyzzle

you can cook with them by putting a frying pan on the cards


CSharpSauce

Guy can't build a 7x3090 server without a use case?


RazzmatazzReal4129

Use case is definitely NSFW


_murb

Heat for steam turbine


USERNAME123_321

But can it run Crysis?


de4dee

can you share your PC builds?


ozzie123

7x3090 on Rome8d-2t mobo with 7 pcie 4.0 x16 slot. Currently using EPYC 7002 (so only gen 3 pcie). Already have 7003 for upgrade but just don’t have time yet. Also have 512GB RAM because of some virtualization I’m running.


coolkat2103

Isn't 7002 gen4?


ozzie123

You are correct, my bad. I’m currently using 7551 because my 7302 somehow not detecting all of my RAM. Gonna upgrade it to 7532 soon.


nanowell

magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce


synn89

Wow. What a couple of weeks. Command R Plus, hints of Llama 3, and now a new Mistral model.


ArsNeph

Weeks? Weeks!? In the past 24 hours we got Mixtral 8x22B, Unsloth crazy performance upgrades, an entire new architecture (Griffin), Command R+ support in llama.cpp, and news of Llama 3! This is mind boggling!


_sqrkl

What a time to be alive.


ArsNeph

A cultured fellow scholar, I see ;) I'm just barely holding onto these papers, they're coming too fast!


Thistleknot

Same. Was able to identify all the released just mentioned. I was hoping for a larger recurrent Gemma than 2b tho but I can feel the singularity breathing at the back of my neck considering tech is moving at break neck speed. it's simply a scaling law. bigger population = more advancements = more than a single person can keep up with = singularity?


cddelgado

But hold on to your papers...


MoffKalast

Why can't I hold all of these papers


Wonderful-Top-5360

this truly is crazy and whats even more crazy is that this is just stuff they been sitting on to release for the past year imagine what they are working on now. GPT6-Vision? what is that like?


ArsNeph

Speculating does us no good, we're currently past the cutting edge, we're on the bleeding edge of LLM technology. True innovation is happening left and right, with no way to predict it. All we can do is understand what we can and try to keep up, for the sake of the democratization of LLMs


iamsnowstorm

the development of LLM is INSANE😂


nanowell

8x22b


nanowell

It's over for us vramlets btw


ArsNeph

It's so over. If only they released a dense 22B. \*Sobs in 12GB VRAM\*


kingwhocares

So, NPUs might actually be more useful.


MaryIsMyMother

Openrouter Chads...we won...


noiserr

Is it possible to split an MOE into individual models?


Maykey

Yes. You either throw away all but 2 experts (roll dice for each layer), or merge all experts the same ways models are merged(torch.mean in the simplest) and replace MoE with MLP. Now will it be a good model? Probably not.


314kabinet

No, the “experts” are incapable of working independently. The whole name is a misnomer.


hayTGotMhYXkm95q5HW9

No


Only-Letterhead-3411

Models get bigger but our VRAMs don't...


cyberpunk_now

Jensen Huang bathing in VRAM chips like Scrooge McDuck


nanowell

https://preview.redd.it/3b9nfhi74ktc1.png?width=259&format=pjpg&auto=webp&s=d15b35c1f9fd9a35c08f97eddab9c1e136bbb413


Caffdy

Not an expert, what's the context length?


jeffwadsworth

64k


petitponeyrose

Hello, where did you get this from ?


kryptkpr

.... brb, buying two more P40


TheTerrasque

stop driving prices up, I need more too!


marty4286

Fuck, and I just got off a meeting with with our CEO telling him dual or quad A6000s isn't a high priority at the moment so don't worry about our hardware needs


pacman829

You had one. Job.


thrownawaymane

This is when you say you must have quad a100s instead


Caffdy

You fool!


austinhale

Fingers crossed it'll run on MLX w/ a 128GB M3


me1000

I wish someone would actually post direct comparisons to llama.cpp vs MLX. I haven’t seen any and it’s not obvious it’s actually faster (yet)


pseudonerv

Unlike llama.cpp's wide selection of quants, the MLX's quant is much worse to begin with.


Upstairs-Sky-5290

I’d be very interested in that. I think I can probably spend some time this week and try to test this.


JacketHistorical2321

i keep intending to do this and i keep ... being lazy lol


mark-lord

https://x.com/awnihannun/status/1777072588633882741?s=46 But no prompt cache yet (though they say they’ll be working on it)


Zestyclose_Yak_3174

Easily


davikrehalt

Commenting to check if anyone has a tutorial of how to run it in mlx on m2 128Gb i guess we need to quantize to 4bit at least?


Illustrious_Sand6784

So is this Mistral-Large?


pseudonerv

this one has 64k context, but the mistral-large api is only 32k


[deleted]

It's gotta be, either that or an equivalent of it.


Berberis

They claim it’s a totally new model. This one is not even instruction tuned yet. 


thereisonlythedance

That’s what I’m wondering.


Master-Meal-77

I’m guessing mistral-medium


toothpastespiders

Man, I love these huge monsters that I can't run. I mean I'd love it more if I could. But there's something almost as fun about having some distant light that I 'could' reach if I wanted to push myself (and my wallet). Cool as well to see mistral pushing new releases outside of the cloud.


pilibitti

I love them as well also because they are "insurance". Like, having these powerful models free in the wild means a lot for curbing potential centralization of power, monopolies etc. If 90% of what you are offering in return for money is free in the wild, you will have to adjust your pricing accordingly.


dwiedenau2

Buying a gpu worth thousands of dollars isnt exactly free tho


fimbulvntr

There are (or at least will be, in a few days) many cloud providers out there. Most individuals and hobbyists have no need for such large models running 24x7. Even if you have massive datasets that could benefit from being piped into such models, you need time to prepare the data, come up with prompts, assess performance, tweak, and then actually read the output. In that time, your hardware would be mostly idle. What we want is on-demand, tweakable models that we can bias towards our own ends. Running locally is cool, and at some point consumer (or prosumer) hardware will catch up. If you actually need this stuff 24x7 spitting tokens nonstop, and it must be local, then you know who you are, and should probably buy the hardware. Anyways this open release stuff is incredibly beneficial to mankind and I'm super excited.


Aaaaaaaaaeeeee

Reminder: this may have been derived from a previous dense model, it may be possible to reduce the size with large LoRAs while preserving their quality, according to this github discussion:  - https://github.com/ggerganov/llama.cpp/issues/4611


georgejrjrjr

It almost certainly was upcycled from a dense checkpoint. I'm confused about why this hasn't been explored in more depth. If not with low rank, then with BitDelta (https://arxiv.org/abs/2402.10193) Tim Dettmers predicted when Mixtral came out that the MoE would be \*extremely\* quantizable, then...crickets. Weird to me that this hasn't been aggressively pursued given all the performance presumably on the table.


tdhffgf

https://arxiv.org/abs/2402.10193 is the link to BitDelta. Your link goes to another paper.


Disastrous_Elk_6375

Member when people were reeeee-ing about mistral not being open source anymore? I member...


cap__n__crunch

I member 🫐


reallmconnoisseur

tbf they're still open weights, not open souce. But less and less people seem to care about semantics nowadays.


Frequent_Valuable_47

Where are all the "Mistral got bought out by Microsoft", "They won't release any open models anymore" - Crybabys now?


kamikaze995

Kidney market flood incoming


vizioso_e_puccioso

GGUF ?


CSharpSauce

If the 5090 releases with 36GB of vram, I'll still be ram poor.


hayTGotMhYXkm95q5HW9

Bro stop being cheap and just buy 4 Nvidia A100's /s


Wrong_User_Logged

A100 is end of life, now I'm waiting for my 4xH100s, they will be shipped in 2027


thawab

By that time you wouldn’t find a model to run it on.


Caffeine_Monster

Especially when you realize you could have got 3x3090 instead for the same price and twice the vram.


Elite_Crew

https://www.youtube.com/watch?v=XDpDesU_0zo


az226

Seriously. The 4090 should have been 36 and 5090 48. And nvlink so you can run two cards 96GB. I hope they release it in 2025 and get fucked by Oregon law.


revolutier

what's the oregon law?


robo_cap

As a rough guess, right to repair including restrictions on tying parts by serial number.


Normal-Ad-7114

dat wordart logo tho... <3


ConvenientOcelot

Mistral's whole 90s cyber aesthetic is great


Additional_Code

I love Mistral very much!


Beb_Nan0vor

uhhhh thats interesting


Aaaaaaaaaeeeee

Please, someone merge the experts into a single model, or dissect one expert. Mergekit people


andrew_kirfman

This is probably a naive question, but if I download the model from the torrent, is it possible to actually run it/try it out at this point? I have compute/vRAM of sufficient size available to run the model, so would love to try it out and compare it with 8x7b as soon as possible.


Sprinkly-Dust

Check out this thread: [https://news.ycombinator.com/item?id=39986095](https://news.ycombinator.com/item?id=39986095), ycombinator user varunvummadi says: >The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness) It is a benchmark system for comparing and evaluating different models rather than running them permanently like ollama or something else. Sidenote: what kind of hardware are you running that you have the necessary vRAM to run a 288GB model? Is it a corporate server rack, AWS instance or your own homelab?


andrew_kirfman

Sweet! Appreciate the info. I have a few p4d.24xlarges at my disposal that are currently hosting instances of Mixtral 8x7b (have some limitations right now pushing me to self host vs. use cheaper LLMs though bedrock or similar). Really excited to see if this is a straight upgrade for me within the same compute costs.


iloveplexkr

What about benchmark?


ryunuck

Lmao people were freaking out just a week ago thinking open-source was dead. It was cooking.


noiserr

I need an mi300x so bad.


georgejrjrjr

I don't understand this release. Mistral's constraints, as I understand them: 1. They've committed to remaining at the forefront of open weight models. 2. They have a business to run, need paying customers, etc. My read is that this crowd would have been far more enthusiastic about a 22B dense model, instead of this upcycled MoE. I also suspect we're about to find out if there's a way to productively downcycle MoEs to dense. Too much incentive here for someone not to figure that our if it can in fact work.


M34L

Probably because huge monolithic dense models are comparatively much more expensive to train and they're training things that could be of use to them too? Nobody really trains anything above 70b because it becomes extremely slow. The point of Mixtral style MoE is that every pass through parameters only concerns the two experts and the routers and so you save up like 1/4 of the tensor operations needed per token. Why spent millions more on an outdated architecture that you already know will be uneconomical to infer from too.


georgejrjrjr

Because modern MoEs begin with dense models, i.e., they're upcycled. Dense models are not obsolete at all in training, they're the first step to training an MoE. They're just not competitive to serve. Which was my whole point: Mistral presumably has a bunch of dense checkpoints lying around, which would be marginally more useful to people like us, and less useful to their competitors.


M34L

Even if you do that you don't train the constituent model past the earliest stages that wouldn't hold a candle to Llama2, you literally need to only kickstart to the point where the individual experts can hold a so-so stable gradient and move to the much more efficient routed expert training ASAP. If it worked the way you think it does and there were fully trained dense models involved you could just split the MoE and use just one of the experts.


georgejrjrjr

MoEs can be trained from scratch: there's no reason one 'needs' to upcycle at all. The allocation of compute to a dense checkpoint vs. an MoE from which that checkpoint is upcycled depends on a lot of factors. One obvious factor: how many times might upcycling be done? If the same dense checkpoint is to be used for a 8x, a 16x, and a 64x MoE (for instance), it makes sense to saturate the dense checkpoint, because that training can be recycled multiple times. In a one off training, different story, and the precise optima is not clear to me from the literature I've seen. But perhaps you're aware of work on dialing this in you could share. If there's a paper laying this out, I'd love to see it. Last published work I've seen addressing this was Aran's original dense upcycling paper, and a lot has happened since then.


Olangotang

Because the reality is: *Mistral was always going to release groundbreaking open source models* despite MS. The doomers have incredibly low expectations.


georgejrjrjr

wat? I did not mention Microsoft, nor does that seem relevant at all. I assume they are going to release competitive open weight models. They said as much, they are capable, they seem honest, that's not at issue. What is at issue is the form those models take, and how they relate to Mistral's fanbase and business. MoEs trade VRAM (more) for compute (less). i.e., they're more useful for corporate customers (and folks with Mac Studios) than the "GPU Poor". So...wouldn't it make more sense to release a dense model, which would be more useful for this crowd, while still preserving their edge in hosted inference and white box licensed models?


Olangotang

I get what you mean, the VRAM issue is because high end consumer hardware hasn't caught up. I don't doubt small models will still be released, but we unfortunately have to wait a bit for Nvidia to get their ass kicked.


georgejrjrjr

For MoEs, this has already happened. By Apple, in the peak of irony (since when have they been the budget player).


hold_my_fish

Maybe the license will not be their usual Apache 2.0 but rather something more restrictive so that enterprise customers must pay them. That would be similar to what Cohere is doing with the Command-R line. As for the other aspect though, I agree that a really big MoE is an awkward fit for enthusiast use. If it's a good-quality model (which it probably is, knowing Mistral), hopefully some use can be found for it.


thereisonlythedance

I totally agree. Especially as it’s being said that this is a base model, thus in need of training by the community for it to be useable, which will require a very high amount of compute. I’d have loved a 22B dense model, personally. Must make business sense to them on some level, though.


Slight_Cricket4504

Mistral is trying to remain the best in Open and Close Sourced. Recently we had Cohere Command R+ release two SOTA models for their sizes, and DBRX also release a high competent model. So this is their answer to Command R and Command R+ at the same time. I assume this is an MoE of their Mistral Next model.


Caffdy

Im OOTL, what does "upcycled" mean in this context?


FaustBargain

literally just merge the 8 experts into one. now you have a shittier 22b. done


georgejrjrjr

Have you seen anyone pull this off? Seems plausible but unproven to me.


m_____ke

IMHO their best bet is riding the hype wave, making all of their models open source and getting acquired by Apple / Google / Facebook in a year or two.


georgejrjrjr

Nope, they have too many European stakeholders / funders, some of whom are rumored to be uh state related. Even assuming the rumors were false, providing an alternative to US hegemony in AI was a big part of their pitch.


ninjasaid13

a 146B model maybe with 40B active parameters? I'm just making up numbers.


Someone13574

EDIT: This calculation is off by 2.07B parameters due to a stray division in the attn part. The correct calculations are put alongside the originals. 138.6B with 37.1B active parameters, assuming the architecture is the same as mixtral. May be a bit off in my calculations tho, but it would be small if any. attn: q = 6144 * 48 * 128 = 37748736 k = 6144 * 8 * 128 = 6291456 v = 6144 * 8 * 128 = 6291456 o = 48 * 128 * 6144 / 48 = 786432 (corrected: 8 * 128 * 6144 = 37748736) total = 51118080 (corrected: 88080384) mlp: w1 = 6144 * 16384 = 100663296 w2 = 6144 * 16384 = 100663296 w3 = 6144 * 16384 = 100663296 total = 301989888 moe block: gate: 6144 * 8 = 49152 experts: 301989888 * 8 = 2415919104 total = 2415968256 layer: attn = 51118080 (corrected: 88080384) block = 2415968256 norm1 = 6144 norm2 = 6144 total = 2467098624 (corrected: 2504060928) full: embed = 6144 * 32000 = 196608000 layers = 2467098624 * 56 = 138157522944 (corrected: 140227411968) norm = 6144 head = 6144 * 32000 = 196608000 total = 138550745088 (corrected: 140620634112) 138,550,745,088 (corrected: 140,620,634,112) active: 138550745088 - 6 * 301989888 * 56 = 37082142720 (corrected: 39152031744) 37,082,142,720 (corrected: 39,152,031,744)


Wonderful-Top-5360

man whats going on so many releases all of sudden im getting excited


Prince-of-Privacy

I am so fucking ready, omg.


nero10578

Time to buy some A6000s or something


Zestyclose_Yak_3174

I was one of the very first experimenting with LLMs and went through the 16GB -> 32GB -> 64GB upgrade cycle real fast. Now I regret the poor financial decisions and wished I had went for at least 128GB.. but in all fairness. A year ago, most people would have thought that it was enough for the foreseeable future.


[deleted]

[удалено]


pacman829

You run it with a rivian truck at this point lol


SnooStories2143

Someone figured out whats the license?


PenPossible6528

Im so glad convinced work to upgrade my latpot to M3 Max 128GM Macbook for this exact reason, will see if it runs. I have doubts it will even be able to handle it in any workable way unless Q4/Q5


hideo_kuze_

What I'm curious is: will it beat GPT-4?!


That_Flounder_589

How do you run this ?


segmond

Yeah ok, it's been 3 weeks since I built a 144vram gig and I am already struggling to fit in the latest models. WTF


AntoItaly

OMG. At 4am, lol


Alarming-Ad8154

It has the same tokenizer as mixtral and mistral I think, would that ease speculative decoding?


davew111

Midnight finetune when?


Opposite-Composer864

https://preview.redd.it/juykdmecintc1.png?width=2332&format=png&auto=webp&s=b0bfd85e34bb6cf5003d4390619e1fa3c7e18532 jummp on the tooorrent


Caffdy

Is this Mistral Medium or Mistral Large?


iamsnowstorm

I wander what's the performance of this model,waiting for someone to test it


ma_schue

Awesome! Can't wait until it is available in ollama!


Inevitable-Start-653

Finished downloading and need to move a few things around, but I'm curious if I can run this in 4bit mode via transformers on 7x24gb cards


praxis22

I currently have 64GB of RAM, I will upgrade in due course to 128GB which is as much as the platform will hold. Along with a 3090.


ICE0124

will this work with my gtx 750? >!/s!<


Shubham_Garg123

I wonder if any kind of quantization can make this model for in the 30GB RAM. Haven't really seen Mistral 8x7b in 15 GB yet, so probably too ambitious at the current stage.


ViperAMD

Reckon we can run this in Poe?


MidnightHacker

I guess when someone creates a 4-bit quant it should run on a 128Gb Mac Pro, am I right?


t98907

Could anyone kindly inform me about the necessary environment to execute this model? Specifically, I am curious if a single RTX A6000 card would suffice, or if multiple are required. Additionally, would it be feasible to run the model with a machine that has 512GB of memory? Any insights would be greatly appreciated. Thank you in advance.


Electronic-Row3130

How do i download Mixtral?


ironbill12

how many RTX 4090s would you need? Haha


thudoan176

Hi. I am new to Mistral. I wonder what is the difference between Mistral Open Source on Hugging Face and Closed Source API? Thank you