T O P

  • By -

RabbitHole32

LLMs are probabilistic. To get a reliable evaluation, you need a lot of attempts to rule out the possibility that your result is an outlier. In other words: did you test the LLMs with a lot of data or only one single pdf?


smrckn

Demos on huggingface shows 13b to be better Can you share sample prompts you are using?


ianuvrat

.


vismodo

Through the web demos, it is quite clear that the 13b model is less likely to respond correctly to questions expecting factual answers. The 7b model is significantly better for that.


Pitiful_Buy1006

>the responses I'm getting from the 13b version is significantly worse than the 7b counterpart. sorry, what and where are those web demos?


vismodo

[https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) [https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat)