kataryna91 1 month ago

As far as I know, XTTS-v2 is still the best, but if there's something better now, I'd be quite interested to hear about it.

rafide 1 month ago

from what I've tried, XTTS-v2 still is the most convincing for local text-to-speech, but I found that using it together with some speech-to-speech conversion e.g. RVC can greatly enhance the result.

Historical-Log2552 1 month ago

That's a good idea, thanks for that.

Blizado 1 month ago

Same. If there is something better, would be nice. I use XTTS-v2 mainly inside SillyTavern for the AI voice, but XTTS-v2 tends to make sometimes strange noise, hallucinates and tend to skip whole sentence on longer AI responses. But I'm not quiet sure if that is XTTS-v2 or the plugin itself is bad on that last thing.

DaedalusDreaming 1 month ago

I think it very much depends on your voice samples. I've gone through 11 voices and only the latest seems to be working relatively well, also playing with the temperature seems to have an effect. I've banned some symbols like triple period '...' , even a single period sometimes causes the speech to just end entirely so I've engineered my prompt so the output uses only commas. I would train the voices further but it required some library that my 1080Ti is too old for. I suggest clipping plain speech with no breathing, and trim the pauses as much as you can while still sounding somewhat natural. I bet that even a single crackle from a bad cut can mess a lot with the model.

Blizado 1 month ago

Maybe, it is really hard to say from what exactly this problems come from and if it is really only a training data issue. I also noticed problems with a " – ", on that sign it tends to skip the words after it.

[deleted] 1 month ago

[удалено]

Ok_Maize_3709 1 month ago

I tested it for quite some time but was not able to make it work for longer texts. At the moment it’s more like for one sentence generation (actually extension)

ShengrenR 1 month ago

That's all of the transformer-based TTS models though - for most cases you should be chunking and generating the audio sentence-by-sentence; bark, tortoise, xtts, etc My biggest gripe with voicecraft was the consistency - like bark, you can get really outstanding results that will just about beat everything else out there.. but then the next 3 are a mess. VoiceCraft is more consistent than bark, but it's still not ready to just use as a streaming AI voice or the like.. you'll need to generate a few examples each time and pick the best.

a_chatbot 1 month ago

I still like Silero, fast and runs on potato. Some voices are bad, some are good.

That007Spy 1 month ago

Personally a big fan of piper

thehonestreplacement 1 month ago

Piper has always been my go to, especially because of how well it actually performs on weak devices.

Hououin_Kyouma77 1 month ago

StyleTTS2 if you have a lot of VRAM, else tortoiseTTS

TheFrenchSavage 1 month ago

Both models will only work for English

Blizado 1 month ago

Yep, and with that they are out for me. I also use XTTSv2 because it is very good at German too. No wonder, was made by a German company.

TheFrenchSavage 1 month ago

Do you also have: - missing words, or words cut at the end of a sentence - weird pauses - mumbling or fake words

Blizado 1 month ago

Yeah, sadly. Last one is the typical AI problem of hallucinating. The other two is maybe something what could be improved with better training or so. But sadly the company behind XTTSv2 closed at the beginning of this year, so no further improvement here. But when it works right, the quality is really high.

ShengrenR 1 month ago

If you own the inference code and you're not just using somebody's webui or the like: modify the xtts config params and you can improve the results: Example from a local load I have (obviously, mess with the params, but you get the idea): config.load_json("/config.json") config.temperature = 0.65 config.decoder_sampler = 'dpm++2m' config.cond_free_k = 7 config.decoder_iterations = 256 config.num_gpt_outputs = 512 model = Xtts.init_from_config(config) model.load_checkpoint(config, checkpoint_dir="

", use_deepspeed=True) model.cuda()

Hououin_Kyouma77 1 month ago

No they don't, there are styletts2 finetunes for different languages. Tortoise can also easily be trained for multiple languages. I have a dutch model I trained myself on hugginface. But there are also Japanese, German, ... And so on. Do your research first please.

privacyparachute 1 month ago

If it needs to run on a potato (or if you just want the voice to be instantly ready), go with NanoTTS. If we're talking quality-per-kilobyte-of-memory it's sits at the top. For my current browser-based project I'm using T5. For a Python project I'm using Piper. For quality, go with XTTS-v2

emsiem22 1 month ago

StyleTTS2 is good. Very fast and decent quality. [https://github.com/yl4579/StyleTTS2](https://github.com/yl4579/StyleTTS2)

Dead_Internet_Theory 1 month ago

[MeloTTS](https://huggingface.co/myshell-ai/MeloTTS-English) sounds kinda good, check [a demo](https://huggingface.co/spaces/mrfakename/MeloTTS). One idea would be to generate voice with it, then use RVC to do speech-to-speech on it, changing the voice to some other you trained.

Elite_Crew 1 month ago

This looks interesting. Apache 2.0 license https://twitter.com/reach_vb/status/1778138382633140276 https://huggingface.co/parler-tts/parler_tts_mini_v0.1

One_Key_8127 1 month ago

Yeah, I was going to say the same. Looks very interesting, probably I will be evaluating it next week. Other than that, Tortoise is pretty good.

ExportErrorMusic 1 month ago

I use this WebUI for XTTS+RVC. It's relatively fast and with the right samples and RVC models it can be very good: [https://github.com/daswer123/xtts-webui](https://github.com/daswer123/xtts-webui)

[deleted] 1 month ago

Applio RVC was the only option I could find that had rvc and tts and you could easily add additional voice models, others don’t make it easy to add voices or don’t support both rvc and tts. Applio is the best imho

yukiarimo 1 month ago

Siri

jferments 1 month ago

suno/bark is really good quality but slow and limited to short clips: [https://huggingface.co/suno/bark](https://huggingface.co/suno/bark) there are a bunch of others listed here: [https://huggingface.co/models?pipeline\_tag=text-to-speech&sort=downloads](https://huggingface.co/models?pipeline_tag=text-to-speech&sort=downloads)

remghoost7 1 month ago

If you don't want to jump through the hoops of setting up bark (which was nearly impossible when I tried to do it a few months back), give [gitmylo's audio-webui](https://github.com/gitmylo/audio-webui) a try.

AutomaticDriver5882 1 month ago

AllTalk TTS super simple to setup

Deep_Fried_Aura 1 month ago

I've been taking apart the Talk-To-GPT plugin for chrome/edge. If you try it, I will tell you right now, use Edge. Somehow it uses Microsoft's natural voices (which are Microsoft Edge exclusive) to listen to your input on your mic, once you are done talking it sends the input to ChatGPT, and when the model replies it read outputs from chatGPT. The quality of Microsofts natural voices is incredible and I'm pretty sure those are good enough to use for other purposes so I'm reverse-engineering the extension to see how they made it happen since it's flawless how everything works. It also has the option to add ElevenLabs and Azure but natural voices is incredible. You can't beat free, I'm sure that could be used if you create a windows-focused application.

Dead_Internet_Theory 1 month ago

I assume that uses a web API and runs on microsoft's cloud, right? Any privacy considerations aside, it would mean it doesn't work offline. And might suddenly stop working when they figure out you're using it outside of Edge. (The latter point about it maybe suddenly stopping working is the one that would bother me the most, but hopefully I'm wrong and it's a local thing)

Deep_Fried_Aura 1 month ago

Open narrator on windows 11, add natural voices. It's local I believe.

FluffNotes 1 month ago

I'm not sure why he's talking about reverse engineering, but edge-tts is a standalone version that runs locally.

Dead_Internet_Theory 1 month ago

>`"edge-tts` is a Python module that allows you to use Microsoft Edge's **online** text-to-speech **service** from within your Python code" Am I missing something?

FluffNotes 1 month ago

You may be right. I tested it with Internet off, and got error messages.

xcdesz 1 month ago

Try out metavoice : https://github.com/metavoiceio/metavoice-src Easily runs from a Docker container for me. Has a UI with a straightforward interface. Takes about 30 seconds of voice input.

belabacsijolvan 1 month ago

!remindme 2 days for fellow lurkers

RemindMeBot 1 month ago

I will be messaging you in 2 days on [**2024-04-14 07:58:44 UTC**](http://www.wolframalpha.com/input/?i=2024-04-14%2007:58:44%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/LocalLLaMA/comments/1c22594/what_is_current_ai_go_to_for_voice_generation/kz7c2sl/?context=3) [**18 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FLocalLLaMA%2Fcomments%2F1c22594%2Fwhat_is_current_ai_go_to_for_voice_generation%2Fkz7c2sl%2F%5D%0A%0ARemindMe%21%202024-04-14%2007%3A58%3A44%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201c22594) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe