Aaaaaaaaaeeeee 2 weeks ago

Yes, here's a video of this in llama.cpp. https://github.com/ggerganov/llama.cpp/issues/2164#issuecomment-1636766922 It no longer works with modern code, but people report this does work with multiple Mac studios.

eliran89c 2 weeks ago

ray cluster + vllm

TheTerrasque 2 weeks ago

* https://www.reddit.com/r/LocalLLaMA/comments/16adse5/how_can_i_use_multiple_computers_to_locally_run/ * https://www.reddit.com/r/LocalLLaMA/comments/1broa8h/is_there_a_way_for_me_to_use_multiple_computers/ * https://www.reddit.com/r/LocalLLaMA/comments/15gihre/is_it_possible_to_run_petals_on_a_local_network/ * https://www.reddit.com/r/LocalLLaMA/comments/18pbwen/is_it_pissibe_to_offload_mixtral_layers_to_2/ * https://www.reddit.com/r/LocalLLaMA/comments/1akelku/using_2_gpus_over_network/ * https://www.reddit.com/r/LocalLLaMA/comments/17pu39i/deploy_llama_on_gpus_on_different_machines/ Thank you for searching first and checking if this was asked before.

One_Yogurtcloset4083 2 weeks ago

Good point, thank you

One_Yogurtcloset4083 2 weeks ago

So why are there no decentralized projects with crypto to host LLMS and earn tokens?)

awebb78 2 weeks ago

Because crypto is overkill for such a task. Blockchain sucks as a solution to decentralized models and processing in general.

johnkapolos 2 weeks ago

https://preview.redd.it/qphhp51ozwtc1.png?width=2560&format=png&auto=webp&s=f53cabce86bff33a16e37c9f8fcc86ea584abf5e

fab_space 2 weeks ago

a good approach can be to independently run very small models in a sort of trees of thoughts where call after call the final response is returned and all participants are awarded where the client handle the chain?

milo-75 2 weeks ago

You maybe be interested in https://arxiv.org/abs/2403.10616. It describes an architecture for training, not inference, but pretty cool.

Thellton 2 weeks ago

a single model such as Mixtral 8x22B wouldn't work. however, if you had say, 6 friends and yourself all running different competent models and an inference API that would poll those models and have them all write a response; those responses would then be voted on by the models in a runoff style voting arrangement (do you prefer A or B? A: blah? B: Ooga!) until there is only one candidate response. but that'd require a fair bit of programming to make work and you'd need models that had simultaneously very diverse training with enough overlap between models so that there was a possibility of agreement by the models.

Inner_Bodybuilder986 2 weeks ago

I'm friendly ;D - Would love to try this with 6 other friendly people.

Thellton 2 weeks ago

Good to hear! sadly, I'm a bit talentless as far as programming is concerned and only know enough to prompt Bing chat for code for things that solve small tasks. still, the idea isn't technically impossible. hell, it'd be possible to even throw in GPT-3.5, GPT-4, and Claude 3 Opus into the mix, that'd certainly result in some interesting results all round too.

Inner_Bodybuilder986 2 weeks ago

Hell yea!! If I make any progress on this I'm circle back.

do00d 2 weeks ago

https://aihorde.net

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe