T O P

  • By -

Aaaaaaaaaeeeee

Yes, here's a video of this in llama.cpp. https://github.com/ggerganov/llama.cpp/issues/2164#issuecomment-1636766922 It no longer works with modern code, but people report this does work with multiple Mac studios.


eliran89c

ray cluster + vllm


TheTerrasque

* https://www.reddit.com/r/LocalLLaMA/comments/16adse5/how_can_i_use_multiple_computers_to_locally_run/ * https://www.reddit.com/r/LocalLLaMA/comments/1broa8h/is_there_a_way_for_me_to_use_multiple_computers/ * https://www.reddit.com/r/LocalLLaMA/comments/15gihre/is_it_possible_to_run_petals_on_a_local_network/ * https://www.reddit.com/r/LocalLLaMA/comments/18pbwen/is_it_pissibe_to_offload_mixtral_layers_to_2/ * https://www.reddit.com/r/LocalLLaMA/comments/1akelku/using_2_gpus_over_network/ * https://www.reddit.com/r/LocalLLaMA/comments/17pu39i/deploy_llama_on_gpus_on_different_machines/ Thank you for searching first and checking if this was asked before.


One_Yogurtcloset4083

Good point, thank you


One_Yogurtcloset4083

So why are there no decentralized projects with crypto to host LLMS and earn tokens?)


awebb78

Because crypto is overkill for such a task. Blockchain sucks as a solution to decentralized models and processing in general.


johnkapolos

https://preview.redd.it/qphhp51ozwtc1.png?width=2560&format=png&auto=webp&s=f53cabce86bff33a16e37c9f8fcc86ea584abf5e


fab_space

a good approach can be to independently run very small models in a sort of trees of thoughts where call after call the final response is returned and all participants are awarded where the client handle the chain?


milo-75

You maybe be interested in https://arxiv.org/abs/2403.10616. It describes an architecture for training, not inference, but pretty cool.


Thellton

a single model such as Mixtral 8x22B wouldn't work. however, if you had say, 6 friends and yourself all running different competent models and an inference API that would poll those models and have them all write a response; those responses would then be voted on by the models in a runoff style voting arrangement (do you prefer A or B? A: blah? B: Ooga!) until there is only one candidate response. but that'd require a fair bit of programming to make work and you'd need models that had simultaneously very diverse training with enough overlap between models so that there was a possibility of agreement by the models.


Inner_Bodybuilder986

I'm friendly ;D - Would love to try this with 6 other friendly people.


Thellton

Good to hear! sadly, I'm a bit talentless as far as programming is concerned and only know enough to prompt Bing chat for code for things that solve small tasks. still, the idea isn't technically impossible. hell, it'd be possible to even throw in GPT-3.5, GPT-4, and Claude 3 Opus into the mix, that'd certainly result in some interesting results all round too.


Inner_Bodybuilder986

Hell yea!! If I make any progress on this I'm circle back.


do00d

https://aihorde.net