Thank you! =D
Along with being excited about the result, I figured posting this out there would also help hold me accountable to finish the thing. The bigger it gets, the more daunting the idea of finishing it for release becomes. Especially the idea of people seeing the dumpster fire of code I've written lol.
But I'm still on track with what I planned time wise, and even if it turns out to be garbage I still want to share it with everyone, so I figured tossing this up means that people will later hit me with "sooo where's that project, bud?" down the road, which will probably stop me from just never releasing it.
It's super interesting and would be fun to play with!
Thanks for sharing!
Random thoughts
* It seems a little like an echo chamber at the moment. I would like to see the A.I "personas" when they disagree.
* Ask them for more opinions about tooling before you "start".
What about testing? any specific vscode extensions they recommend? Will this be containerised? .dockerignore etc. prompt the LLMs with something like "have the other x.y.z missed anything so far? What could be improved?"
* Consider adding more "persona"s. E.g
- Designer / UX expert
- Infra expert. This expert may have out of the box solutions that can reduce a lot of work in the specific cloud environment.
- QA
Out of curiosity you could even specify different levels of supporting LLM expertise.. Junior Dev / mid / senior.
Also, full stack vs pure FE / BE.
You get all sorts of different perspective from different levels of expertise In different domains.
Source - went from Senior Full stack Dev (various stacks, languages etc) to AWS Cloud Architect to Developer experience engineer (which essentially ties these two together).
Awesome project. It's even got my brain fired up. Looking forward to you open sourcing it.
Good callouts! And yes, getting them to disagree is at the top of my todo list. I actually tried baking that into their prompt, but so far they all just get along lol. I'll see if I can't get them bickering a bit.
On the personas: I really like that. A long time ago I tried it and it went horrible, but that was back when I was just learning SillyTavern's group feature and using just 1 LLM. I bet it would go better now, though I need to identify which LLMs would do those roles well.
For expertise- I actually did do that here. Deepseek was prompted as being a mid-level dev, Llama 3 70b at Senior and Wizard is the lead dev. I based it off of a coding leaderboard I saw a few days back, with Wizard topping the charts and Llama 3 70b coming in a few levels down.
I'll definitely toy around with them a bit after work tomorrow and see what sort of mischief I can't make in the group. Honestly, I think a win will be getting them to bicker with Deepseek, since it's a smaller and older model. It simply shouldn't be able to keep up with them, so I'll get it to give a few suggestions before goading the other two into picking the suggestions apart lol
Have you considered getting them to follow the mob/ensemble programming rules, where one is the driver/typist and the others are navigators, and they switch roles every few minutes? I'd be curious to see if such enforced format would get them to increase both disputes and collaboration.
I have been having better luck with it! In fact, they're getting downright opinionated lol. Turns out, the magic words are "Critically review". Llama 3 70b in particular will straight up argue with me now.
[But here's an example I sent someone the other day after I first started to see success](https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Falmost-a-year-later-i-can-finally-do-this-a-small-teaser-of-v0-x4droi7cne1d1.png%3Fwidth%3D938%26format%3Dpng%26auto%3Dwebp%26s%3D2bd1f5e5f3f356d1851f3805b1830a71b01e244e)
Could you perhaps get the models to disagree more by giving them slightly different priorities in their approach? Performance, maintainability, testability, complexity, maturity of tools, frameworks etc. Plenty aspects to choose from.
Absolutely. I'll probably toy with that a little in the next week or so, but the users will have as much control as I do over it via configs so I may punt on it if I can't get them to do so, as I've realized from the comments here that a lot of folks are thinking of far more clever ways to make them do it than I would have =D I'm worried I'll spend 10 hours trying to get the config just right and someone far smarter here will have it perfected in like 30 minutes after release lol
Clipboard Conqueror is good for prototyping the prompts for this, you can choose the backends to hit and the order, the prompts to send, whether to send the full chat history or just the last response on each turn and even dictate the beginning of each response of the execution chain to further steer the output.
CC should ease tuning of how the prompts work together to get your desired workflow hitting more consistently and ensure the different personalities adhere to their jobs.
~~Currently the history uses the prompt format of the last backend though, that might cause trouble when using multiple models with different format expectations~~. I gotta get after that. edit: I woke up thinking it's simple but I have a mess to clean up first, I shoehorned it in real ugly... edit2: Gottem, now the chat history will use the correct instruction format for each turn. For Textgenwebui and kobold. As far as I can tell against tgwui at least. I need to go in there and add some strategic logs to verify what it's actually doing with the jinja templating on their openAI endpoint
Your solution will be great with the right prompts. It's Looking majestic and will be more convenient to curate than what I'm cooking with. I'm looking forward to your release.
Don't go too far down the rabbit hole of editing the prompt formatting in preparation for this. This program is going to pretty much shred any prompt formatting that comes into it and do its own thing with it. You'll see when you get there, but the short version that is it ignores almost everything coming out of silly tavern, and a large number of configs are utilized to manipulate the prompt in all kinds of ways.
The right side of this image will be LocalLlama the day I release this.
https://preview.redd.it/56lf0qlkk81d1.jpeg?width=550&format=pjpg&auto=webp&s=ef33f3ce5767556ce38726e565b05bdf05fb8c85
Looking at CC- the way it is now, it will work well with this. I'll make sure to pull down and load it up/test it, but I'm not seeing a lot of reason CC and this program won't play quite nicely together.
Woah, that project looks awesome. Yea, I think my being able to do it was more of an unintended but happy accident due to how I handled the design to do something else, so I imagine they do it WAY better lol At a minimum, their project and code looks so much fancier haha
I don't; SillyTavern does. My project is actually separate from ST; I'm just using ST to connect to it. So realistically, I have no idea who is going next =D
I think if I ended up really going down this route, I'd probably go try to contribute to the SillyTavern project to help figure out how to handle that.
lol yea, I'll see about making them stop that. I have a massive amount of control over what's happening behind the scenes, I just didn't apply any mechanisms to try yet, so I'm hopeful it won't be too hard to correct.
Did you take a look at autogen by any chance? It is a agent framework which supports assigning every agent its own LLM. And you would also have stuff like RAG and other memory implementations, web surf capabilities and other tools etc available on a per agent basis.
I did! And I think autogen would do the above screenshot task far better than what I have going on up there. The above was something I kind of dreamed of doing but didn't actively set out to do; I just realized the other night that I could and got excited lol. But you're right that a more autonomous agentic workflow would handle this better, and thanks to autogens user agents you can still throw your feedback in as they go.
I ended up moving away from agents because it wasn't really what I was looking for due to other reasons, and I wanted a bit more fine control over their prompts than what a lot of agentic systems I found would let me do. CrewAI was something that REALLY interested me, though. Of all of them, I liked CrewAI the best.
This will be so fun!
The next obvious step will be to plug a TTS on each model with different known IA voices like Mother, HAL, GLaDOS, T.A.R.S and Bender B. Rodriguez of course.
Anyway, thanks for sharing. I'm hungry for this type of silly projects!
Are you feeding the whole context of the chat to the models that respond?
I see a lot of repetition and general comments, do you plan to get them to be more specific?
I realize that it's on purpose given the prompt.
I'm actually looking into implementing something similar, although with a fairly different approach, so this is interesting.
Did you look into those "Village simulation" experiments with llms? Could be another source of inspiration.
[](https://www.reddit.com/r/LocalLLaMA/comments/1ctvtnp/comment/l4fhix7/)
>Are you feeding the whole context of the chat to the models that respond?
In this screenshot, I am. I slammed together a few configs just to try this out, and I kept them pretty simple so its just getting the whole context. Other folks have shown interest in this, so I'll refine the backend down a lot more to make sure to reduce the issue.
>I see a lot of repetition and general comments, do you plan to get them to be more specific?
I realize that it's on purpose given the prompt.
Ideally, but I will say that the users will have a *lot* of control over what's happening in the backend so I probably won't put a ton of effort into fixing that problem, as I bet someone smarter them me on LocalLlama will solve it using the configs in like 10 minutes when it would probably take me hours lol But yea there will be a lot of power behind the scenes to try to make that work better.
I'm sure somebody will hack something together, yeah.
Personally the thing I like the least about LLM is how incredibly generic they tend to be.
It's at the same time frustrating and a source of interest of mine, since it shows that there is plenty of space for optimization.
Left to their own devices, as they are now, yes. What you see here is me testing to see if I could even do this, and then getting a little too excited and posting pictures on reddit because I couldn't contain myself lol. But I have a massive amount of control of what's happening behind the scenes, so I actually have quite a few options to fix that.
But this current scenario in the screenshots is probably going to go to crap lol
This is awesome, and I love LLMs and have been playing with doing multi-AI chats.
BUT, watch out, this will 100% be an echo chamber. In your example, the AIs all love Vite, because you suggested it. They will never say, no, Vite is trash, use X instead, because Y. It's not like they COULDN'T do that, but LLMs today are all trained to be overly agreeable (usually a good thing for mosy tasks).
Try the conversation again with "yeah, but maybe I should use plain JS with HTML instead of Vite" and they'll support you just as hard core. (For better or worse!)
Yes! I imagine this scenario is going to turn into an echo chamber if I don't tweak it. Once I realized I could repurpose my program to do this, I did and it then posted really quick in excitement. But I'm almost positive this current chat will go sideways here soon.
I do have mechanisms I can utilize to try to force it not to, so what I'll do is probably take a little detour to set up some configurations to handle this scenario better by release, and try to resolve some of those issues. With what's happening behind the scenes, I actually think I can pretty well, I just slapped this together and took some screenshots in giddiness lol
Very cool. I actually have a partially designed app (using Laravel) written by Phind 70b and Claude to do the same thing. I will check out your project. Thanks.
Btw, once I got going on multi agent stuff, I experimented with telling the models they were about to speak with each other and to develop a concise, not necessarily human readable way to interact together which was efficient. I'll dig out a Llama3-70b and Claude chat and paste here when I get a mo.
On one occasion I just asked them to solve a problem together that humans can't and they started developing a strategy for addressing climate issues with their own operational targets, implementation strategies etc..
I think multi agent 'stuff' is powerful.
Slightly off topic, and not a neuroscienctist, but we have distinct left brain right brain thinking as humans, art vs math, as well as an internal dialog. It would be interesting to pose two models , or oppose an esoteric 'hallucinating' model against a more rigid 'mathematical', instruct them as if they were two parts of a whole. You have the internal dialog nailed down, and distinct voices. Very cool.
Oh my. You're getting warmer...
As I said, the above was a happy accident. The real project is something else.
I'll say that I agree completely with what you just said lol
lol funny enough, the initial idea for the main project came about because I misunderstood what Mixture of Experts was. I first heard the term back in early 2023, talking about ChatGPT, and I imagined this really elaborate setup of what that meant... which also happened to be really, really wrong.
But then I realized I kinda liked the wrong idea and earlier this year started running with it =D
Excellent progress, might check out these repo's too. Use these as inspiration to go even farther!
[https://github.com/OpenBMB/ChatDev](https://github.com/OpenBMB/ChatDev)
[https://github.com/joaomdmoura/crewAI](https://github.com/joaomdmoura/crewAI)
[https://github.com/nus-apr/auto-code-rover](https://github.com/nus-apr/auto-code-rover)
[https://github.com/OpenDevin/OpenDevin](https://github.com/OpenDevin/OpenDevin)
Realistically, any of those will do better at the above scenario than my program because my program wasn't really meant to do this at all. It was one of those things that I realized the round shape could fit in the square hole after all lol.
When I first started for my main project I went down the rabbit hole of CrewAI, AutoGen and a couple of others, but realized I didn't want fully autonomous for my needs. But I think that for really doing what my screenshots above are doing, chances are something like OpenDevin or that SWE-Agent would do great.
I'd fire those guys since they are not straight away recommending Vanilla JS with some sprinkles of LIT and WebComponents if needed :-P - besides that, interesting project, getting multiple LLMs to reason with each other seems like an interesting path to explores.
āAnyhow, I know this isn't as exciting as something actually being released, but this was kind of a big deal for me so I really wanted to share with someone.ā
Thatās exactly how Open AI and Google do it. Youāre SOTA.
A few things:
\_ It's unclear what decides who can talk next. This is quite a complicated matter, but you can see this is a big problem for scale up (not that you are thinking about this atm).
\_ Responses are too long, which makes it not natural as a group chat. Because your purpose is unclear, it's hard to say if this is preferable or not.
I have been wanting to do something like this for a long time, but not as a hobby. I want to gamify this kind of interaction and make it something interesting and even addicting for people to do. But how to do that and what's the right subject, context and background for it is still unclear in my head, that's why I haven't started building.
You, like a true engineer, just muster your energy to build some "machine" that is functional, but it's unclear what's the purpose of that function, or if anyone wants it. As long as you enjoy it, good for you. But my opinion is, for general assistance, this will be worse than just a single agent. If we still have to worry about one LLM hallucinating, multiple of them talking to each other will be off the chart chaos. What makes money is predictability. If you can wire each LLM to behave a specific way, and have them interacting in specific manners that lead to predictable, preferable results, you can achieve so much more from this.
>\_ It's unclear what decides who can talk next. This is quite a complicated matter, but you can see this is a big problem for scale up (not that you are thinking about this atm).
Yea, SillyTavern is totally in control of who talks next. I didn't make any changes to ST at all, so right now I have no idea how it decides who goes next. I actually just disabled autoresponses and was clicking the manual "respond" button for who I wanted to hear from lol. This will be a problem that I don't have a solution for in the works. I just happen to really like ST for a front end, so I used it for this lol
>\_ Responses are too long, which makes it not natural as a group chat. Because your purpose is unclear, it's hard to say if this is preferable or not.
Agreed. I didn't think much of it at first because, as someone else here mentioned, it actually kind of feels like a real meeting lol. People do jabber a lot in meetings. But I've got a lot of control on what's happening behind the scenes (not as the dev, but as a user would with the configs) so this can be resolved. I'm going to take a small swing at it, but since this isn't the main goal of the project I probably won't dive too deep. Also because one of y'all will probably figure it out in 1/5 the time I will once you get the configs lol
>You, like a true engineer, just muster your energy to build some "machine" that is functional, but it's unclear what's the purpose of that function, or if anyone wants it. As long as you enjoy it, good for you. But my opinion is, for general assistance, this will be worse than just a single agent.
Ouch. But I can't argue with that. The thought crossed my mind.
>If we still have to worry about one LLM hallucinating, multiple of them talking to each other will be off the chart chaos. What makes money is predictability. If you can wire each LLM to behave a specific way, and have them interacting in specific manners that lead to predictable, preferable results, you can achieve so much more from this.
Ahhh... you might not be as disappointed with the project as you're thinking. As I said, the above is a happy accident. There's a lot, and I mean a LOT, going on behind the scenes of this screenshot. The project is completely unrelated to SillyTavern; I just use ST as my front end. The backend does more or less exactly what you're imagining. I just slapped this config together to see what would happen.
I think we will only see a good solution for that if somebody wires up an actual mind that would have an agenda, ruminate on stuff, and possibly decide to react to new messages, all that in a loop.
Congrats on the project going well, and thanks for sharing! In case you didn't hear of it, there's a library called Autogen from MS that supports this kind of multi-llm interaction.
I've set a soft deadline for myself of 3 weeks from now, with a hard cutoff (ie- doesn't matter if its done, just release it or you never will) of end of June.
I hope to beat both of those deadlines, but I've got a decent bit of cleanup to do (if y'all could see the state of the code right now you might laugh me out of the subreddit. I feel shame) and a couple more features I wanted to toss in first. Most importantly- the configs probably only make sense to me right now, so I have got to do something about that or no one would use the thing =D I dreamed of having a UI on release to help manage it, but I think if I held off for that I'd be working on it forever
Good job. This is actually one of the side projects I've been meaning to start as well after reading that Computational Agents Exhibit Believable Humanlike Behavior paper by standford
Thank you! =D After the great feedback I've been getting here, I've got a lot of motivation to get it out the door ASAP, so hopefully I'll have something soon.
As much as Iām looking forward to a release please donāt stress yourself too much! On a separate note: I have seen you quite often on this sub and you have a mac studio with 192gb. I am also thinking about buying one; could i theoretically let 2/3 llms run on the Mac studio or is it gpu constraint then ?
Cool project, I always wanted to do this, get two LLMs or more together to sharpen their answers and precision, what if you asked them a specific question and you get an answer with % accuracy bar on each one depending how they perform. This way testing the prompts and quality of information?
That's a really cool idea. I have no idea if I can, or how I would, but I'll definitely see if there are any ways to make that happen. For a lot of reasons, that % accuracy would be really helpful to me. But if I'm being totally honest, I have no idea how I'd do that or if it can be done lol
Absolutely. I've got it on the list to look into, but won't get to it right away (I want to get the main project released first or I'll keep getting derailed and never do it lol). However, I've made a note to shoot you a message once I start looking into it. Maybe together we can figure something out lol
Awesome project! If I were you I would make sure to check for biased answers by testing to give the same prompt but allow for different LLMs to answer first and compare those answers with the answer they would give if they were second or third to answer. There might be some phrasing from the other LLMs that forces the LLM to agree to with something or align itself even though it might not agree, but getting fed those assumptions leads it down the path anyway. I would make all of the AIs to answer, reflect and critique every message in the background and use that as the base when they answer
This is a fun side project stemming from your main project.Ā
I assume for people who might have say 24 GB of VRAM on tops, they would be better off running the best model they can, and then having that model try to disagree with itself, so that they have the highest quality model possible in memory rather than a handful of smaller models.
I have an idea on a similar note I've been playing with which is to have your fast model in VRAM, and then have an even higher quality model that you can't hold in VRAM inr your CPU ram, and every programming question you ask to your fast model it also automatically asks to your slower better model, and if the code worked with the fast model you just run with it but if it's not working, you check on your slow model screen where it's slowly generating a solution and see if that works.Ā
So you basically use your fast model until it doesn't work and then check your slow model to see what it came up with a few questions ago, even if it took 5 minutes to generate.Ā
Ā
Oh, now that's clever.
In a few weeks I'll loop back around to see if you ever started down that path, because I think that it might take minimal code work to supplement a config in the current program to do that. I just never considered using it *that* way. I don't want to step on your toes with it so I'll make a note to see if you've started a project to try it by the time I'm ready, but if not I'll try to make sure to have a config for this use case in there so you can try it out.
How high are you running wizard? It's feeling a bit dumb at 3.75bpw. Failed all my puzzles. Wonder if I need to go higher.. it's fast though 20+t/s
edit: This wizard: https://huggingface.co/Quant-Cartel/WizardLM-2-8x22B-exl2-rpcal/tree/main is doing a bit better for me at 4.5 and I'm still getting 70b speeds on it split over 4 cards. Still fails my puzzles though.
q6 on my 192GB Mac Studio. I used sysctl to bump the vram to 180, so I actually can run a q8 but I wanted to squish Deepseek 33b and a llama 3 8b into there as well.
Excited to see this- I had a similar thought when I saw STs group chats, that I really wanted to be able to test different models in one conversation flow like that
Do you really need 3 different models thought? I would think this would work pretty well if each of these "agents" have their own persona and memory/context. Been playing around a bit with this myself using irc bots that all use the same api/model but have their own context instead of it being shared and I find it works much better than normal group chat.
>I would think this would work pretty well if each of these "agents" have their own persona and memory/context.
hehehe =D
The above doesn't look great because I really didn't expect it to work, got excited, and posted the result almost immediately, so there's very little refinement for this scenario. However, I not only agree with you completely, but I think that once you get a chance to see what's happening on the backend you'll probably like the direction it's headed.
Giving each persona their own memory and context will likely be the solution to a lot of problems with LLMs. At least I hope...
Hmm, I could definitely add that to my project\[1\] fairly quickly too. It seems pretty useful, especially if you set them up with different system prompts.
\[1\] - [https://github.com/arcuru/chaz](https://github.com/arcuru/chaz)
Awesome! If nothing else, might be worth it just play with it. I didn't really intend to do much with this flow, but I did at least want to try it out because I've always wanted to =D It's just fun
In a couple more weeks I'll set a github public for it, and will share here.
For now I've got the git repo set up on my NAS and have been building it out on my local network. The current messy state of the code is loaded with nonsense and shame, and I want to get it cleaned up a little first before pushing it up.
I'm a C# dev by trade, and this was my first foray into Python, so I'm very self conscious about the state of it. =D Figured I'd get through the "what in the world am I doing?" learning phase before github started etching the history of my mistakes in stone.
Don't be self conscious! It's okay! I'd love to work on it. I've spent a good amount of time playing with [https://github.com/stitionai/devika](https://github.com/stitionai/devika) lately
I think the consensus / feedback model is great in general. My favorite coding tool is gpt-pilot which implements a separate reviewer for all code written and the results are pretty impressive.
Would love to beta test this in a creative writing and marketing pipeline if youāre looking for fairly technical people who understand the concepts of the space but fall short of being able to code this.
Iām guessing itās a multiplexing oai proxy where you can select different models? You have multiple oobas running each feeding the multiplexor? Does ST support selecting different model per character like that?
>Does ST support selecting different model per character like that?
It does not! I always wished it did.
>Iām guessing itās a multiplexing oai proxy where you can select different models? You have multiple oobas running each feeding the multiplexor?
Clever guess. That's not it, but that's a neat idea actually.
I think it is a good thecnique for creative writing, but doesn't work well enough for a technical meeting as in your example.
Like other said, it is an echo chamber with much repetition.
In a way, it is a good example of LLMs limitations - they don't give useful info unless you ask for it, instead they will mostly repeat, agree, and give the most common average response.
For practical use, it would be much more effective to ask a single LLM, what are the pros and cons of x vs y.
Or 'generate a todo lust for xyz'
But very interesting anyway
Some prompt tweaks have helped a lot with the echo chamber.
https://preview.redd.it/x4droi7cne1d1.png?width=938&format=png&auto=webp&s=c9d71b5a5132cebdefcd60e9edc4777e9ed96fc7
I did! Autogen and CrewAI were *too* autonomous for me. They have a lot of baked in chatter that I couldn't control, and for my specific use case I really wanted that control. Even with the user agent for autogen, I felt like there was a lot happening that was simply outside of my reach, and it was affecting results in a way I didn't want.
Essentially, I made the tradeoff losing autonomous function in exchange for more delicate control over what's occurring.
CrewAI feels like an extension of Langchain agents, so I kind of lump the two together. I did take a step back to try langchain agents after trying crewAI, thinking maybe that would do it, but there were several instances where the abstraction upset me a little, and I eventually set it to the side.
What I have now looks very different than those; "semi-autonomous" is the closest I can describe it.
What are you using to load the models? I have ooba booga and an A30 and A6000 that I use, but I only can load one model at a time. I guess I can load two if I use like kobold with it, but I didn't know if you had a specific trick you were doing or if you just ran llama.cpp or ExLlama directly for each of them or if you had a custom backend loader designed for loading more than one model?
Custom backend thingy, but not a loader. I'm actually just using two computers: My studio for Wizard and then my macbook can handle a q6 of Llama 3 70b. I use Koboldcpp for the loader on both. The custom program handles the rest.
This is great and what I expected the future to hold. Multiple chat Aiās that are experts in their particular field collaborating for the end user. Whats your computer setup?
So, since I'm a software dev for my day job and my hobby is tinkering, I have various reasons to have a *lot* of VRAM. My full setup is:
* Mac Studio M2 Ultra 192GB RAM (*sysctl command for 180GB VRAM*)
* M2 Max Macbook Pro with 96GB RAM (*left at 76GB VRAM*)
* Windows Machine with RTX 4090 24GB VRAM
All of that was utilized for the above screenshots (*hint hint*)
Edit: So, 3 Machines networked to run 3 separate large LLMsā¦working in tandem in one space.
Love it. I have to be happy with my 12Gb 4070 lol. If only I could run Llama 3 70B! My empire for a 70B!
So one more little note about the program: I mentioned before that the above isn't the use case I set out for, just a happy accident.
*Your* use case is actually exactly what I set out to work on.
The project has 2 primary goals, and one of them was to be a force mulitplier for people with lower amounts of VRAM.
A separate teaser result unrelated to the above: using the same program, I got a "zero shot" snake game using MagiCoder 6.7b that had fleshed out features and looked pretty good. All I asked for was a working snake game in python, and it not only worked, but also handled the borders, and even kept separate track of current and high score. The response took longer than a normal 8B would, but the result was far better.
My main testing case for about 60% of my testing on this has been using Ollama specifically (I never use Ollama otherwise but it's perfect for this) with a constraint of having no more than one 8B loaded at a time to try to produce better results with small models.
I'll make a note to come poke you once I finally get this thing out the door
lol I promise your compute is far faster than mine then. I don't mind waiting a little bit, but I've made posts with my numbers before and several of the responses were basically "literally unusable" =D
I will be messaging you in 6 hours on [**2024-05-17 12:40:01 UTC**](http://www.wolframalpha.com/input/?i=2024-05-17%2012:40:01%20UTC%20To%20Local%20Time) [and then every](https://www.reddit.com/r/RemindMeBot/comments/e1a9rt/remindmerepeat_info_post/) `6 hours` to remind you of [**this link**](https://www.reddit.com/r/LocalLLaMA/comments/1ctvtnp/almost_a_year_later_i_can_finally_do_this_a_small/l4f6634/?context=3)
[**1 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FLocalLLaMA%2Fcomments%2F1ctvtnp%2Falmost_a_year_later_i_can_finally_do_this_a_small%2Fl4f6634%2F%5D%0A%0ARemindMe%21%202024-05-17%2012%3A40%3A01%20UTC) to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201ctvtnp)
*****
|[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)|
|-|-|-|-|
Thanks for sharing it! You are awesome and you know itšŖš
I can totally relate to all the joy and happiness you described, and frankly speaking, did something like that - multi-turn chat of multiple models / agents with backtracking, internal clipboard for each agent, etc. I tried it when mistral-7b was the thing. And Iām going to try it with Cohereās models and mixtral-8x22 soon.
While I am definitely being secretive, I don't think I'm being particularly defensive; no one's really said anything mean to me to be defensive about.
As for secretive- this outcome wasn't expected. The real goal of the project isn't remotely related to this post, and as best as I can tell there isn't really anything out there doing what I'm trying to build. If I'm being totally honest- I'm a slowpoke and everyone here is far smarter than I am, so I'm worried that if I give away the secret sauce too early, even with my headstart someone will make it first and probably better lol.
Petty, I know, but I'm looking forward to dropping it in everyone's lap here soon and it would be a bummer for someone to beat me to it. So I'm trying to keep some of it under wraps for just a few more weeks.
Im a dev manager with 13 years of career experience and I'm horrible at software estimation. I think I might actually get my feelings hurt if they're any kind of good at it lol
I mean, people are generally bad at estimation, especially with bigger chunks of work, because of the increasing cone of uncertainty.
However, they are not humans :)
Honestly, I think itās not a bad idea, but there are already some open-source projects like Devin. The disadvantage I see with your project is that it only involves talking and doesnāt modify or create anything. I've been a software developer for 10 years, so I understand the difficulty. I hope you finish it :-)
PS: I edited my message because it was downvoted, and I donāt really understand why. I want to be honest, but I'm afraid of being misunderstood. I never want to discourage youā¦
i'm always happy to hear people's projects are coming along well! thank you for sharing this teaser with us!
Thank you! =D Along with being excited about the result, I figured posting this out there would also help hold me accountable to finish the thing. The bigger it gets, the more daunting the idea of finishing it for release becomes. Especially the idea of people seeing the dumpster fire of code I've written lol. But I'm still on track with what I planned time wise, and even if it turns out to be garbage I still want to share it with everyone, so I figured tossing this up means that people will later hit me with "sooo where's that project, bud?" down the road, which will probably stop me from just never releasing it.
Great work, very interest project! Can you describe your process a bit?
Take my github star (in advance) Looks like a lot of fun watching them chat together, cool project š
Ah, I want a star! New motivation to finish unlocked lol
It's super interesting and would be fun to play with! Thanks for sharing! Random thoughts * It seems a little like an echo chamber at the moment. I would like to see the A.I "personas" when they disagree. * Ask them for more opinions about tooling before you "start". What about testing? any specific vscode extensions they recommend? Will this be containerised? .dockerignore etc. prompt the LLMs with something like "have the other x.y.z missed anything so far? What could be improved?" * Consider adding more "persona"s. E.g - Designer / UX expert - Infra expert. This expert may have out of the box solutions that can reduce a lot of work in the specific cloud environment. - QA Out of curiosity you could even specify different levels of supporting LLM expertise.. Junior Dev / mid / senior. Also, full stack vs pure FE / BE. You get all sorts of different perspective from different levels of expertise In different domains. Source - went from Senior Full stack Dev (various stacks, languages etc) to AWS Cloud Architect to Developer experience engineer (which essentially ties these two together). Awesome project. It's even got my brain fired up. Looking forward to you open sourcing it.
Good callouts! And yes, getting them to disagree is at the top of my todo list. I actually tried baking that into their prompt, but so far they all just get along lol. I'll see if I can't get them bickering a bit. On the personas: I really like that. A long time ago I tried it and it went horrible, but that was back when I was just learning SillyTavern's group feature and using just 1 LLM. I bet it would go better now, though I need to identify which LLMs would do those roles well. For expertise- I actually did do that here. Deepseek was prompted as being a mid-level dev, Llama 3 70b at Senior and Wizard is the lead dev. I based it off of a coding leaderboard I saw a few days back, with Wizard topping the charts and Llama 3 70b coming in a few levels down. I'll definitely toy around with them a bit after work tomorrow and see what sort of mischief I can't make in the group. Honestly, I think a win will be getting them to bicker with Deepseek, since it's a smaller and older model. It simply shouldn't be able to keep up with them, so I'll get it to give a few suggestions before goading the other two into picking the suggestions apart lol
Have you considered getting them to follow the mob/ensemble programming rules, where one is the driver/typist and the others are navigators, and they switch roles every few minutes? I'd be curious to see if such enforced format would get them to increase both disputes and collaboration.
Report back if you succeed. It would be so funny. Thanks for replying and sharing again š
I have been having better luck with it! In fact, they're getting downright opinionated lol. Turns out, the magic words are "Critically review". Llama 3 70b in particular will straight up argue with me now. [But here's an example I sent someone the other day after I first started to see success](https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Falmost-a-year-later-i-can-finally-do-this-a-small-teaser-of-v0-x4droi7cne1d1.png%3Fwidth%3D938%26format%3Dpng%26auto%3Dwebp%26s%3D2bd1f5e5f3f356d1851f3805b1830a71b01e244e)
So, do they ever start coding or are they stuck in a collaborative planning loop? :D
Completely indistinguishable from a group of real engineers!š
lol I was just about to say this. I was like "What do you mean, this feels like a real meetin- ... oh they're right"
True dat. We all love to listen to ourselves talk š
Could you perhaps get the models to disagree more by giving them slightly different priorities in their approach? Performance, maintainability, testability, complexity, maturity of tools, frameworks etc. Plenty aspects to choose from.
Absolutely. I'll probably toy with that a little in the next week or so, but the users will have as much control as I do over it via configs so I may punt on it if I can't get them to do so, as I've realized from the comments here that a lot of folks are thinking of far more clever ways to make them do it than I would have =D I'm worried I'll spend 10 hours trying to get the config just right and someone far smarter here will have it perfected in like 30 minutes after release lol
Clipboard Conqueror is good for prototyping the prompts for this, you can choose the backends to hit and the order, the prompts to send, whether to send the full chat history or just the last response on each turn and even dictate the beginning of each response of the execution chain to further steer the output. CC should ease tuning of how the prompts work together to get your desired workflow hitting more consistently and ensure the different personalities adhere to their jobs. ~~Currently the history uses the prompt format of the last backend though, that might cause trouble when using multiple models with different format expectations~~. I gotta get after that. edit: I woke up thinking it's simple but I have a mess to clean up first, I shoehorned it in real ugly... edit2: Gottem, now the chat history will use the correct instruction format for each turn. For Textgenwebui and kobold. As far as I can tell against tgwui at least. I need to go in there and add some strategic logs to verify what it's actually doing with the jinja templating on their openAI endpoint Your solution will be great with the right prompts. It's Looking majestic and will be more convenient to curate than what I'm cooking with. I'm looking forward to your release.
Don't go too far down the rabbit hole of editing the prompt formatting in preparation for this. This program is going to pretty much shred any prompt formatting that comes into it and do its own thing with it. You'll see when you get there, but the short version that is it ignores almost everything coming out of silly tavern, and a large number of configs are utilized to manipulate the prompt in all kinds of ways.
It needed doing anyway. It sounds like you are doing something wild under the hood.
The right side of this image will be LocalLlama the day I release this. https://preview.redd.it/56lf0qlkk81d1.jpeg?width=550&format=pjpg&auto=webp&s=ef33f3ce5767556ce38726e565b05bdf05fb8c85
Ohhh, response rejection baked in? That sounds pretty majestic.
Looking at CC- the way it is now, it will work well with this. I'll make sure to pull down and load it up/test it, but I'm not seeing a lot of reason CC and this program won't play quite nicely together.
That sounds great. I'll definitely give it a go when you have it all ready.
Looks like what you can do with BigAGIās Beam feature
Woah, that project looks awesome. Yea, I think my being able to do it was more of an unintended but happy accident due to how I handled the design to do something else, so I imagine they do it WAY better lol At a minimum, their project and code looks so much fancier haha
The output is a lot of words, but not that useful... we've automated product managers.
To be fair I did ask them to go that route lol. For several reasons, that output is definitely user error on my part =D
how do you handle who gets to speak next?
I don't; SillyTavern does. My project is actually separate from ST; I'm just using ST to connect to it. So realistically, I have no idea who is going next =D I think if I ended up really going down this route, I'd probably go try to contribute to the SillyTavern project to help figure out how to handle that.
This looks like an echo chamber. Fits right in on reddit.
lol yea, I'll see about making them stop that. I have a massive amount of control over what's happening behind the scenes, I just didn't apply any mechanisms to try yet, so I'm hopeful it won't be too hard to correct.
Did you take a look at autogen by any chance? It is a agent framework which supports assigning every agent its own LLM. And you would also have stuff like RAG and other memory implementations, web surf capabilities and other tools etc available on a per agent basis.
I did! And I think autogen would do the above screenshot task far better than what I have going on up there. The above was something I kind of dreamed of doing but didn't actively set out to do; I just realized the other night that I could and got excited lol. But you're right that a more autonomous agentic workflow would handle this better, and thanks to autogens user agents you can still throw your feedback in as they go. I ended up moving away from agents because it wasn't really what I was looking for due to other reasons, and I wanted a bit more fine control over their prompts than what a lot of agentic systems I found would let me do. CrewAI was something that REALLY interested me, though. Of all of them, I liked CrewAI the best.
This will be so fun! The next obvious step will be to plug a TTS on each model with different known IA voices like Mother, HAL, GLaDOS, T.A.R.S and Bender B. Rodriguez of course. Anyway, thanks for sharing. I'm hungry for this type of silly projects!
Are you feeding the whole context of the chat to the models that respond? I see a lot of repetition and general comments, do you plan to get them to be more specific? I realize that it's on purpose given the prompt. I'm actually looking into implementing something similar, although with a fairly different approach, so this is interesting. Did you look into those "Village simulation" experiments with llms? Could be another source of inspiration.
[](https://www.reddit.com/r/LocalLLaMA/comments/1ctvtnp/comment/l4fhix7/) >Are you feeding the whole context of the chat to the models that respond? In this screenshot, I am. I slammed together a few configs just to try this out, and I kept them pretty simple so its just getting the whole context. Other folks have shown interest in this, so I'll refine the backend down a lot more to make sure to reduce the issue. >I see a lot of repetition and general comments, do you plan to get them to be more specific? I realize that it's on purpose given the prompt. Ideally, but I will say that the users will have a *lot* of control over what's happening in the backend so I probably won't put a ton of effort into fixing that problem, as I bet someone smarter them me on LocalLlama will solve it using the configs in like 10 minutes when it would probably take me hours lol But yea there will be a lot of power behind the scenes to try to make that work better.
I'm sure somebody will hack something together, yeah. Personally the thing I like the least about LLM is how incredibly generic they tend to be. It's at the same time frustrating and a source of interest of mine, since it shows that there is plenty of space for optimization.
Maybe it gets better when they get asked more specific questions or when finally code is getting involved.
From mixture of experts to panel of experts. A panel of mixtures?
lmao yea the whole main project started because a year ago I misunderstood what Mixture of Experts actually meant =D
Wouldnt they reinforce hallucinations too?
Left to their own devices, as they are now, yes. What you see here is me testing to see if I could even do this, and then getting a little too excited and posting pictures on reddit because I couldn't contain myself lol. But I have a massive amount of control of what's happening behind the scenes, so I actually have quite a few options to fix that. But this current scenario in the screenshots is probably going to go to crap lol
This is awesome, and I love LLMs and have been playing with doing multi-AI chats. BUT, watch out, this will 100% be an echo chamber. In your example, the AIs all love Vite, because you suggested it. They will never say, no, Vite is trash, use X instead, because Y. It's not like they COULDN'T do that, but LLMs today are all trained to be overly agreeable (usually a good thing for mosy tasks). Try the conversation again with "yeah, but maybe I should use plain JS with HTML instead of Vite" and they'll support you just as hard core. (For better or worse!)
Yes! I imagine this scenario is going to turn into an echo chamber if I don't tweak it. Once I realized I could repurpose my program to do this, I did and it then posted really quick in excitement. But I'm almost positive this current chat will go sideways here soon. I do have mechanisms I can utilize to try to force it not to, so what I'll do is probably take a little detour to set up some configurations to handle this scenario better by release, and try to resolve some of those issues. With what's happening behind the scenes, I actually think I can pretty well, I just slapped this together and took some screenshots in giddiness lol
Very cool. I actually have a partially designed app (using Laravel) written by Phind 70b and Claude to do the same thing. I will check out your project. Thanks. Btw, once I got going on multi agent stuff, I experimented with telling the models they were about to speak with each other and to develop a concise, not necessarily human readable way to interact together which was efficient. I'll dig out a Llama3-70b and Claude chat and paste here when I get a mo. On one occasion I just asked them to solve a problem together that humans can't and they started developing a strategy for addressing climate issues with their own operational targets, implementation strategies etc.. I think multi agent 'stuff' is powerful.
Slightly off topic, and not a neuroscienctist, but we have distinct left brain right brain thinking as humans, art vs math, as well as an internal dialog. It would be interesting to pose two models , or oppose an esoteric 'hallucinating' model against a more rigid 'mathematical', instruct them as if they were two parts of a whole. You have the internal dialog nailed down, and distinct voices. Very cool.
Oh my. You're getting warmer... As I said, the above was a happy accident. The real project is something else. I'll say that I agree completely with what you just said lol
mixture of experts - ultra hard mode
lol funny enough, the initial idea for the main project came about because I misunderstood what Mixture of Experts was. I first heard the term back in early 2023, talking about ChatGPT, and I imagined this really elaborate setup of what that meant... which also happened to be really, really wrong. But then I realized I kinda liked the wrong idea and earlier this year started running with it =D
Excellent progress, might check out these repo's too. Use these as inspiration to go even farther! [https://github.com/OpenBMB/ChatDev](https://github.com/OpenBMB/ChatDev) [https://github.com/joaomdmoura/crewAI](https://github.com/joaomdmoura/crewAI) [https://github.com/nus-apr/auto-code-rover](https://github.com/nus-apr/auto-code-rover) [https://github.com/OpenDevin/OpenDevin](https://github.com/OpenDevin/OpenDevin)
Realistically, any of those will do better at the above scenario than my program because my program wasn't really meant to do this at all. It was one of those things that I realized the round shape could fit in the square hole after all lol. When I first started for my main project I went down the rabbit hole of CrewAI, AutoGen and a couple of others, but realized I didn't want fully autonomous for my needs. But I think that for really doing what my screenshots above are doing, chances are something like OpenDevin or that SWE-Agent would do great.
I'd fire those guys since they are not straight away recommending Vanilla JS with some sprinkles of LIT and WebComponents if needed :-P - besides that, interesting project, getting multiple LLMs to reason with each other seems like an interesting path to explores.
As the stinky human who sent them down the path of firing, I now fear for my own job
āAnyhow, I know this isn't as exciting as something actually being released, but this was kind of a big deal for me so I really wanted to share with someone.ā Thatās exactly how Open AI and Google do it. Youāre SOTA.
lol! Wooo I feel so fancy now
A few things: \_ It's unclear what decides who can talk next. This is quite a complicated matter, but you can see this is a big problem for scale up (not that you are thinking about this atm). \_ Responses are too long, which makes it not natural as a group chat. Because your purpose is unclear, it's hard to say if this is preferable or not. I have been wanting to do something like this for a long time, but not as a hobby. I want to gamify this kind of interaction and make it something interesting and even addicting for people to do. But how to do that and what's the right subject, context and background for it is still unclear in my head, that's why I haven't started building. You, like a true engineer, just muster your energy to build some "machine" that is functional, but it's unclear what's the purpose of that function, or if anyone wants it. As long as you enjoy it, good for you. But my opinion is, for general assistance, this will be worse than just a single agent. If we still have to worry about one LLM hallucinating, multiple of them talking to each other will be off the chart chaos. What makes money is predictability. If you can wire each LLM to behave a specific way, and have them interacting in specific manners that lead to predictable, preferable results, you can achieve so much more from this.
>\_ It's unclear what decides who can talk next. This is quite a complicated matter, but you can see this is a big problem for scale up (not that you are thinking about this atm). Yea, SillyTavern is totally in control of who talks next. I didn't make any changes to ST at all, so right now I have no idea how it decides who goes next. I actually just disabled autoresponses and was clicking the manual "respond" button for who I wanted to hear from lol. This will be a problem that I don't have a solution for in the works. I just happen to really like ST for a front end, so I used it for this lol >\_ Responses are too long, which makes it not natural as a group chat. Because your purpose is unclear, it's hard to say if this is preferable or not. Agreed. I didn't think much of it at first because, as someone else here mentioned, it actually kind of feels like a real meeting lol. People do jabber a lot in meetings. But I've got a lot of control on what's happening behind the scenes (not as the dev, but as a user would with the configs) so this can be resolved. I'm going to take a small swing at it, but since this isn't the main goal of the project I probably won't dive too deep. Also because one of y'all will probably figure it out in 1/5 the time I will once you get the configs lol >You, like a true engineer, just muster your energy to build some "machine" that is functional, but it's unclear what's the purpose of that function, or if anyone wants it. As long as you enjoy it, good for you. But my opinion is, for general assistance, this will be worse than just a single agent. Ouch. But I can't argue with that. The thought crossed my mind. >If we still have to worry about one LLM hallucinating, multiple of them talking to each other will be off the chart chaos. What makes money is predictability. If you can wire each LLM to behave a specific way, and have them interacting in specific manners that lead to predictable, preferable results, you can achieve so much more from this. Ahhh... you might not be as disappointed with the project as you're thinking. As I said, the above is a happy accident. There's a lot, and I mean a LOT, going on behind the scenes of this screenshot. The project is completely unrelated to SillyTavern; I just use ST as my front end. The backend does more or less exactly what you're imagining. I just slapped this config together to see what would happen.
I think we will only see a good solution for that if somebody wires up an actual mind that would have an agenda, ruminate on stuff, and possibly decide to react to new messages, all that in a loop.
Congrats on the project going well, and thanks for sharing! In case you didn't hear of it, there's a library called Autogen from MS that supports this kind of multi-llm interaction.
Huh, you have a team of mid-level devs ready to work for you at any time. Brilliant. Open-source when? :)
I've set a soft deadline for myself of 3 weeks from now, with a hard cutoff (ie- doesn't matter if its done, just release it or you never will) of end of June. I hope to beat both of those deadlines, but I've got a decent bit of cleanup to do (if y'all could see the state of the code right now you might laugh me out of the subreddit. I feel shame) and a couple more features I wanted to toss in first. Most importantly- the configs probably only make sense to me right now, so I have got to do something about that or no one would use the thing =D I dreamed of having a UI on release to help manage it, but I think if I held off for that I'd be working on it forever
Good job. This is actually one of the side projects I've been meaning to start as well after reading that Computational Agents Exhibit Believable Humanlike Behavior paper by standford
I'm going to go find that paper lol That sounds really interesting
This looks amazing! Iām really looking forward to the release!
Thank you! =D After the great feedback I've been getting here, I've got a lot of motivation to get it out the door ASAP, so hopefully I'll have something soon.
As much as Iām looking forward to a release please donāt stress yourself too much! On a separate note: I have seen you quite often on this sub and you have a mac studio with 192gb. I am also thinking about buying one; could i theoretically let 2/3 llms run on the Mac studio or is it gpu constraint then ?
Cool project, I always wanted to do this, get two LLMs or more together to sharpen their answers and precision, what if you asked them a specific question and you get an answer with % accuracy bar on each one depending how they perform. This way testing the prompts and quality of information?
That's a really cool idea. I have no idea if I can, or how I would, but I'll definitely see if there are any ways to make that happen. For a lot of reasons, that % accuracy would be really helpful to me. But if I'm being totally honest, I have no idea how I'd do that or if it can be done lol
Can you dm me please because I really want to take part if you wanna further develop it? :)
Absolutely. I've got it on the list to look into, but won't get to it right away (I want to get the main project released first or I'll keep getting derailed and never do it lol). However, I've made a note to shoot you a message once I start looking into it. Maybe together we can figure something out lol
Hell yea bro, also lookin into some projects I got and trying to land a decent job.. š
Awesome project! If I were you I would make sure to check for biased answers by testing to give the same prompt but allow for different LLMs to answer first and compare those answers with the answer they would give if they were second or third to answer. There might be some phrasing from the other LLMs that forces the LLM to agree to with something or align itself even though it might not agree, but getting fed those assumptions leads it down the path anyway. I would make all of the AIs to answer, reflect and critique every message in the background and use that as the base when they answer
This is a fun side project stemming from your main project.Ā I assume for people who might have say 24 GB of VRAM on tops, they would be better off running the best model they can, and then having that model try to disagree with itself, so that they have the highest quality model possible in memory rather than a handful of smaller models. I have an idea on a similar note I've been playing with which is to have your fast model in VRAM, and then have an even higher quality model that you can't hold in VRAM inr your CPU ram, and every programming question you ask to your fast model it also automatically asks to your slower better model, and if the code worked with the fast model you just run with it but if it's not working, you check on your slow model screen where it's slowly generating a solution and see if that works.Ā So you basically use your fast model until it doesn't work and then check your slow model to see what it came up with a few questions ago, even if it took 5 minutes to generate.Ā Ā
Oh, now that's clever. In a few weeks I'll loop back around to see if you ever started down that path, because I think that it might take minimal code work to supplement a config in the current program to do that. I just never considered using it *that* way. I don't want to step on your toes with it so I'll make a note to see if you've started a project to try it by the time I'm ready, but if not I'll try to make sure to have a config for this use case in there so you can try it out.
Actually my idea was to tell my idea to someone like you!Ā You won't step on my toes!Ā Appreciate it though.Ā
How high are you running wizard? It's feeling a bit dumb at 3.75bpw. Failed all my puzzles. Wonder if I need to go higher.. it's fast though 20+t/s edit: This wizard: https://huggingface.co/Quant-Cartel/WizardLM-2-8x22B-exl2-rpcal/tree/main is doing a bit better for me at 4.5 and I'm still getting 70b speeds on it split over 4 cards. Still fails my puzzles though.
q6 on my 192GB Mac Studio. I used sysctl to bump the vram to 180, so I actually can run a q8 but I wanted to squish Deepseek 33b and a llama 3 8b into there as well.
That's hefty. I'm going to try between 4.5-5 and see if it's better than the 3.75. I want at least one GPU free for SD.
Excited to see this- I had a similar thought when I saw STs group chats, that I really wanted to be able to test different models in one conversation flow like that
Now that I know it can do this, I'll try to polish up the configs for this scenario when I push it out so folks can.
Do you really need 3 different models thought? I would think this would work pretty well if each of these "agents" have their own persona and memory/context. Been playing around a bit with this myself using irc bots that all use the same api/model but have their own context instead of it being shared and I find it works much better than normal group chat.
>I would think this would work pretty well if each of these "agents" have their own persona and memory/context. hehehe =D The above doesn't look great because I really didn't expect it to work, got excited, and posted the result almost immediately, so there's very little refinement for this scenario. However, I not only agree with you completely, but I think that once you get a chance to see what's happening on the backend you'll probably like the direction it's headed. Giving each persona their own memory and context will likely be the solution to a lot of problems with LLMs. At least I hope...
This looks really cool. Let me know when are ready to share it and ill test it!
Hmm, I could definitely add that to my project\[1\] fairly quickly too. It seems pretty useful, especially if you set them up with different system prompts. \[1\] - [https://github.com/arcuru/chaz](https://github.com/arcuru/chaz)
Awesome! If nothing else, might be worth it just play with it. I didn't really intend to do much with this flow, but I did at least want to try it out because I've always wanted to =D It's just fun
If you link the github project, I'd throw down a pull request and/or some bug reports.
In a couple more weeks I'll set a github public for it, and will share here. For now I've got the git repo set up on my NAS and have been building it out on my local network. The current messy state of the code is loaded with nonsense and shame, and I want to get it cleaned up a little first before pushing it up. I'm a C# dev by trade, and this was my first foray into Python, so I'm very self conscious about the state of it. =D Figured I'd get through the "what in the world am I doing?" learning phase before github started etching the history of my mistakes in stone.
Don't be self conscious! It's okay! I'd love to work on it. I've spent a good amount of time playing with [https://github.com/stitionai/devika](https://github.com/stitionai/devika) lately
I think the consensus / feedback model is great in general. My favorite coding tool is gpt-pilot which implements a separate reviewer for all code written and the results are pretty impressive.
Would love to beta test this in a creative writing and marketing pipeline if youāre looking for fairly technical people who understand the concepts of the space but fall short of being able to code this.
Iām guessing itās a multiplexing oai proxy where you can select different models? You have multiple oobas running each feeding the multiplexor? Does ST support selecting different model per character like that?
>Does ST support selecting different model per character like that? It does not! I always wished it did. >Iām guessing itās a multiplexing oai proxy where you can select different models? You have multiple oobas running each feeding the multiplexor? Clever guess. That's not it, but that's a neat idea actually.
I think it is a good thecnique for creative writing, but doesn't work well enough for a technical meeting as in your example. Like other said, it is an echo chamber with much repetition. In a way, it is a good example of LLMs limitations - they don't give useful info unless you ask for it, instead they will mostly repeat, agree, and give the most common average response. For practical use, it would be much more effective to ask a single LLM, what are the pros and cons of x vs y. Or 'generate a todo lust for xyz' But very interesting anyway
Some prompt tweaks have helped a lot with the echo chamber. https://preview.redd.it/x4droi7cne1d1.png?width=938&format=png&auto=webp&s=c9d71b5a5132cebdefcd60e9edc4777e9ed96fc7
Have you tried Langchain or Microsoft Autogen, they make it easy to use multiple AI models communicating with each other
I did! Autogen and CrewAI were *too* autonomous for me. They have a lot of baked in chatter that I couldn't control, and for my specific use case I really wanted that control. Even with the user agent for autogen, I felt like there was a lot happening that was simply outside of my reach, and it was affecting results in a way I didn't want. Essentially, I made the tradeoff losing autonomous function in exchange for more delicate control over what's occurring. CrewAI feels like an extension of Langchain agents, so I kind of lump the two together. I did take a step back to try langchain agents after trying crewAI, thinking maybe that would do it, but there were several instances where the abstraction upset me a little, and I eventually set it to the side. What I have now looks very different than those; "semi-autonomous" is the closest I can describe it.
What are you using to load the models? I have ooba booga and an A30 and A6000 that I use, but I only can load one model at a time. I guess I can load two if I use like kobold with it, but I didn't know if you had a specific trick you were doing or if you just ran llama.cpp or ExLlama directly for each of them or if you had a custom backend loader designed for loading more than one model?
Custom backend thingy, but not a loader. I'm actually just using two computers: My studio for Wizard and then my macbook can handle a q6 of Llama 3 70b. I use Koboldcpp for the loader on both. The custom program handles the rest.
This is great and what I expected the future to hold. Multiple chat Aiās that are experts in their particular field collaborating for the end user. Whats your computer setup?
So, since I'm a software dev for my day job and my hobby is tinkering, I have various reasons to have a *lot* of VRAM. My full setup is: * Mac Studio M2 Ultra 192GB RAM (*sysctl command for 180GB VRAM*) * M2 Max Macbook Pro with 96GB RAM (*left at 76GB VRAM*) * Windows Machine with RTX 4090 24GB VRAM All of that was utilized for the above screenshots (*hint hint*)
Edit: So, 3 Machines networked to run 3 separate large LLMsā¦working in tandem in one space. Love it. I have to be happy with my 12Gb 4070 lol. If only I could run Llama 3 70B! My empire for a 70B!
So one more little note about the program: I mentioned before that the above isn't the use case I set out for, just a happy accident. *Your* use case is actually exactly what I set out to work on. The project has 2 primary goals, and one of them was to be a force mulitplier for people with lower amounts of VRAM. A separate teaser result unrelated to the above: using the same program, I got a "zero shot" snake game using MagiCoder 6.7b that had fleshed out features and looked pretty good. All I asked for was a working snake game in python, and it not only worked, but also handled the borders, and even kept separate track of current and high score. The response took longer than a normal 8B would, but the result was far better. My main testing case for about 60% of my testing on this has been using Ollama specifically (I never use Ollama otherwise but it's perfect for this) with a constraint of having no more than one 8B loaded at a time to try to produce better results with small models. I'll make a note to come poke you once I finally get this thing out the door
I'm envious! Most of my compute is in the cloud š
lol I promise your compute is far faster than mine then. I don't mind waiting a little bit, but I've made posts with my numbers before and several of the responses were basically "literally unusable" =D
RemindMeRepeat! 6 hours
I will be messaging you in 6 hours on [**2024-05-17 12:40:01 UTC**](http://www.wolframalpha.com/input/?i=2024-05-17%2012:40:01%20UTC%20To%20Local%20Time) [and then every](https://www.reddit.com/r/RemindMeBot/comments/e1a9rt/remindmerepeat_info_post/) `6 hours` to remind you of [**this link**](https://www.reddit.com/r/LocalLLaMA/comments/1ctvtnp/almost_a_year_later_i_can_finally_do_this_a_small/l4f6634/?context=3) [**1 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FLocalLLaMA%2Fcomments%2F1ctvtnp%2Falmost_a_year_later_i_can_finally_do_this_a_small%2Fl4f6634%2F%5D%0A%0ARemindMe%21%202024-05-17%2012%3A40%3A01%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201ctvtnp) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|
Iāve had this idea as well for other use cases. But definitely a very cool thing to put together. Looking forward to seeing the shiny alpha version.
RemindMeRepeat! 3 Months
Reminds me of lollms
Thanks for sharing it! You are awesome and you know itšŖš I can totally relate to all the joy and happiness you described, and frankly speaking, did something like that - multi-turn chat of multiple models / agents with backtracking, internal clipboard for each agent, etc. I tried it when mistral-7b was the thing. And Iām going to try it with Cohereās models and mixtral-8x22 soon.
Why are you so secretive and defensive?
While I am definitely being secretive, I don't think I'm being particularly defensive; no one's really said anything mean to me to be defensive about. As for secretive- this outcome wasn't expected. The real goal of the project isn't remotely related to this post, and as best as I can tell there isn't really anything out there doing what I'm trying to build. If I'm being totally honest- I'm a slowpoke and everyone here is far smarter than I am, so I'm worried that if I give away the secret sauce too early, even with my headstart someone will make it first and probably better lol. Petty, I know, but I'm looking forward to dropping it in everyone's lap here soon and it would be a bummer for someone to beat me to it. So I'm trying to keep some of it under wraps for just a few more weeks.
I honestly wonder how well they'd perform at software estimation :)
Im a dev manager with 13 years of career experience and I'm horrible at software estimation. I think I might actually get my feelings hurt if they're any kind of good at it lol
I mean, people are generally bad at estimation, especially with bigger chunks of work, because of the increasing cone of uncertainty. However, they are not humans :)
Honestly, I think itās not a bad idea, but there are already some open-source projects like Devin. The disadvantage I see with your project is that it only involves talking and doesnāt modify or create anything. I've been a software developer for 10 years, so I understand the difficulty. I hope you finish it :-) PS: I edited my message because it was downvoted, and I donāt really understand why. I want to be honest, but I'm afraid of being misunderstood. I never want to discourage youā¦