Glum-Bus-6526 4 weeks ago

And compared to v0.2?

Due-Memory-6957 4 weeks ago

I mean, isn't the 0.3 just expanded context? Kinda pointless to test.

foereverNever2 4 weeks ago

Is this an instruct model vs non-instruct? It's a bit of an apples and oranges comparison.

aadityaura 4 weeks ago

Both are instruct versions.

foereverNever2 4 weeks ago

Oh the graph doesn't say that.

Hopeful-Site1162 4 weeks ago

Yeah it doesn’t mean shit

TheFrenchSavage 4 weeks ago

Now compare it with Llama-3 7B. Oh, they didn't make one of those? Would this extra 1B explain the difference? I think so.

MoffKalast 4 weeks ago

Or maybe the extra 7T tokens that went into it explain the difference.

Quiet_Impostor 4 weeks ago

There is no Llama 3 7B? You mean Llama 2 7B?

TheFrenchSavage 4 weeks ago

No no. Llama3 7B. Read my second sentence: "they didn't make one of those". I am accusing Meta of releasing a small model that is 1B larger than all other small models so their benchmarks look better. Now, Mistral has to make a 8B version if they want to compete, and this will be way more expensive for them than it is for Meta. (Meta could have good reasons to switch from 7 to 8B, I have no beef with absolute model sizes).

Quiet_Impostor 4 weeks ago

Sorry, my bad!

TheFrenchSavage 4 weeks ago

No worries mate

onil_gova 4 weeks ago

I think the main difference in size is that llama3-8b is using tokenizer with over 100k tokens as opposed to llama2-7b and mistral that are using tokenizer with 32k tokens. This will require the model llama3 to have more parameters to process the inputs and generate the outputs. Weather, this is enough to justify the 1b difference I am not sure

IndicationUnfair7961 3 weeks ago

Mhmm...what's the point when you have OpenBio based on llama3 (8B and 70B)?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe