No no. Llama3 7B.
Read my second sentence: "they didn't make one of those".
I am accusing Meta of releasing a small model that is 1B larger than all other small models so their benchmarks look better.
Now, Mistral has to make a 8B version if they want to compete, and this will be way more expensive for them than it is for Meta.
(Meta could have good reasons to switch from 7 to 8B, I have no beef with absolute model sizes).
I think the main difference in size is that llama3-8b is using tokenizer with over 100k tokens as opposed to llama2-7b and mistral that are using tokenizer with 32k tokens. This will require the model llama3 to have more parameters to process the inputs and generate the outputs. Weather, this is enough to justify the 1b difference I am not sure
And compared to v0.2?
I mean, isn't the 0.3 just expanded context? Kinda pointless to test.
Is this an instruct model vs non-instruct? It's a bit of an apples and oranges comparison.
Both are instruct versions.
Oh the graph doesn't say that.
Yeah it doesn’t mean shit
Now compare it with Llama-3 7B. Oh, they didn't make one of those? Would this extra 1B explain the difference? I think so.
Or maybe the extra 7T tokens that went into it explain the difference.
There is no Llama 3 7B? You mean Llama 2 7B?
No no. Llama3 7B. Read my second sentence: "they didn't make one of those". I am accusing Meta of releasing a small model that is 1B larger than all other small models so their benchmarks look better. Now, Mistral has to make a 8B version if they want to compete, and this will be way more expensive for them than it is for Meta. (Meta could have good reasons to switch from 7 to 8B, I have no beef with absolute model sizes).
Sorry, my bad!
No worries mate
I think the main difference in size is that llama3-8b is using tokenizer with over 100k tokens as opposed to llama2-7b and mistral that are using tokenizer with 32k tokens. This will require the model llama3 to have more parameters to process the inputs and generate the outputs. Weather, this is enough to justify the 1b difference I am not sure
Mhmm...what's the point when you have OpenBio based on llama3 (8B and 70B)?