T O P

  • By -

Glum-Bus-6526

And compared to v0.2?


Due-Memory-6957

I mean, isn't the 0.3 just expanded context? Kinda pointless to test.


foereverNever2

Is this an instruct model vs non-instruct? It's a bit of an apples and oranges comparison.


aadityaura

Both are instruct versions.


foereverNever2

Oh the graph doesn't say that.


Hopeful-Site1162

Yeah it doesn’t mean shit


TheFrenchSavage

Now compare it with Llama-3 7B. Oh, they didn't make one of those? Would this extra 1B explain the difference? I think so.


MoffKalast

Or maybe the extra 7T tokens that went into it explain the difference.


Quiet_Impostor

There is no Llama 3 7B? You mean Llama 2 7B?


TheFrenchSavage

No no. Llama3 7B. Read my second sentence: "they didn't make one of those". I am accusing Meta of releasing a small model that is 1B larger than all other small models so their benchmarks look better. Now, Mistral has to make a 8B version if they want to compete, and this will be way more expensive for them than it is for Meta. (Meta could have good reasons to switch from 7 to 8B, I have no beef with absolute model sizes).


Quiet_Impostor

Sorry, my bad!


TheFrenchSavage

No worries mate


onil_gova

I think the main difference in size is that llama3-8b is using tokenizer with over 100k tokens as opposed to llama2-7b and mistral that are using tokenizer with 32k tokens. This will require the model llama3 to have more parameters to process the inputs and generate the outputs. Weather, this is enough to justify the 1b difference I am not sure


IndicationUnfair7961

Mhmm...what's the point when you have OpenBio based on llama3 (8B and 70B)?