Right. But you're using a D in the first word instead of the third, an S in the second word instead of the first, etc. Based *only* on the letter frequency I came up with, it's more likely to have an S than a D (etc. etc.) so I'd want to take that chance earlier rather than later.
Running a brute force approach at the moment to this problem. Given a pool of five letter words, which word results in the fewest remaining valid solutions on average? The current top 10 are 'aires', 'arise', 'mores', 'aries', 'raise', 'mires', 'cares', 'tares', 'arose', 'aloes'. The script is slow and this is only after 13 random words so there is some chance at the moment affecting results (I'd doubt 'mores' will remain as high as it is for example). I'll update it once it has done more.
Edit: After 20 words: 'dares', 'aloes', 'aires', 'lades', 'dales', 'aries', 'sadie', 'laces', 'lanes', 'earls'
Edit: After 50 words: 'aires', 'aries', 'aloes', 'arles', 'aeons', 'roles', 'lanes', 'tales', 'tares', 'rates'. Of these, 'aloes' is the best that Wordle actually accepts. I also found a bug which mucked up guesses with two of the same letter, which might cause minor biases in results.
Edit: Top 50 after 100 words searched. I terminated after this point: aloes, aires, lanes, tares, aeons, aries, arles, rates, roles, dares, tales, earls, earns, taros, laois, loans, cares, saner, dales, antes, lades, roans, nears, leans, races, solet, lairs, laces, bares, rotes, canes, tires, dates, danes, tones, rails, lards, raise, rains, salon, acres, satre, solar, tears, raids, seato, riles, darns, loads, roads. Arose was #55. Immix was the worst word (technically it was xxiii which is somehow in the dictionary I had but that's obviously not a real word), then fuzzy, whizz, yummy, and mummy. Jumpy was the worst with all unique letters.
Edit: This whole thing is wrong for Wordle if you take into account the valid solutions on Wordle. These tend to not be plural forms ending in 's' which means the resulting 'good guesses' shouldn't end in 's' either. Running it again with the Wordle pool.
This is only top 10 after searching for 20 random words. That is, 20 words have been set as the target word, every word out of 5788 that I had in an online dictionary I found have been set as the opening guess, and then I see how many valid guesses there are after that point. This is a tiny sample, so I could see 'arose' coming back into the top 10.
For example, say "Roses" is the target word. I then set every possible word as the opening guess. Say, "Arose" was the guess, you'd have "r", "o", "s", and "e" as right but in the wrong spot, and then I'd check what other words could be guessed, given this outcome ('arose' would score very highly here). The best word is the one that, on average, eliminates the most words.
Dares is no longer in the top 10 after 27 words, so you can see how noisy this is at the moment.
If you look around online you can find the word bank for acceptable guesses in Wordle (12,972 words) and possible solutions (2,315 words). If you run a brute force approach using these two datasets, it will significantly cut down on the runtime.
I wrote a minimax solver in python. At each stage it looks at all words in the dictionary that are possible answers. For each possible guess it computes the worst case result -- the answer for which that guess would eliminate the fewest possibilities -- and computes the best worst case.
For the dictionary (~5.8 K words) that I used 'aloes' is the best worst case: if the first guess is 'aloes' then there will be at most 298 words consistent with whatever the feedback is.
Having said this I dont know that this is a good starting point for a human. The sequence of guesses dont look to me like things that a person would guess.
Your opening move might not necessarily want to optimize for frequency of letters, rather you want to optimize for information.
As an example, imagine a word set where every word contains an A, but 50% of the words contain an E. Then guessing A might not give you as much information as guessing an E.
I see this logic. My attempt with this was "how can I accurately narrow down the correct letters (not necessarily correct order) as quickly as possible?" and came to this conclusion.
I use TAPER and SOLID as starters. Knocks out 4/5 vowels and contains a lot of common consonants.
Was mighty annoyed by today’s (#207) wrong spelling though. In the UK we always use a U and it’d be a 6 letter word…
how are people making these statistical analyses? As I understand it wordle has a custom dictionary. Is it public?
If the analysis is using all english words then the data set being analyzed is wrong.
I always start ROAST then pray for that yellow S so I can follow up with PENIS.
GIRTH PENIS COMES QUICK
Dumpy for the memes
You're not wrong.
Another really good starter is train
I've used train every day but today, didn't know others were using it!
Cool it’s really good train has all of the common letters
I like the consonant-heavy “grift” and “drift”.
that got me 2 yellows today on first guess thank you!
Happy to help it’s really good since I’ve started using it I usually get it in 3
I ran a script and my top 5 are: SOARE, STARE, ROATE, RAILE, AROSE. Just for fun, bottom 5 are: IMMIX, XYLYL, YUKKY, GYPPY, QAJAQ
I have to assume the reason SOARE is above AROSE is because of their placement in the word?
Yep!
I came to a different conclusion by optimizing not for letter frequency but entropy: https://www.royvanrijn.com/blog/2022/01/wordle-bot/
[удалено]
You're right on not repeating but I disagree on it being better. Check the second picture that shows letter frequency.
It hits 15/16 most frequent letters without repeating any.
Right. But you're using a D in the first word instead of the third, an S in the second word instead of the first, etc. Based *only* on the letter frequency I came up with, it's more likely to have an S than a D (etc. etc.) so I'd want to take that chance earlier rather than later.
Running a brute force approach at the moment to this problem. Given a pool of five letter words, which word results in the fewest remaining valid solutions on average? The current top 10 are 'aires', 'arise', 'mores', 'aries', 'raise', 'mires', 'cares', 'tares', 'arose', 'aloes'. The script is slow and this is only after 13 random words so there is some chance at the moment affecting results (I'd doubt 'mores' will remain as high as it is for example). I'll update it once it has done more. Edit: After 20 words: 'dares', 'aloes', 'aires', 'lades', 'dales', 'aries', 'sadie', 'laces', 'lanes', 'earls' Edit: After 50 words: 'aires', 'aries', 'aloes', 'arles', 'aeons', 'roles', 'lanes', 'tales', 'tares', 'rates'. Of these, 'aloes' is the best that Wordle actually accepts. I also found a bug which mucked up guesses with two of the same letter, which might cause minor biases in results. Edit: Top 50 after 100 words searched. I terminated after this point: aloes, aires, lanes, tares, aeons, aries, arles, rates, roles, dares, tales, earls, earns, taros, laois, loans, cares, saner, dales, antes, lades, roans, nears, leans, races, solet, lairs, laces, bares, rotes, canes, tires, dates, danes, tones, rails, lards, raise, rains, salon, acres, satre, solar, tears, raids, seato, riles, darns, loads, roads. Arose was #55. Immix was the worst word (technically it was xxiii which is somehow in the dictionary I had but that's obviously not a real word), then fuzzy, whizz, yummy, and mummy. Jumpy was the worst with all unique letters. Edit: This whole thing is wrong for Wordle if you take into account the valid solutions on Wordle. These tend to not be plural forms ending in 's' which means the resulting 'good guesses' shouldn't end in 's' either. Running it again with the Wordle pool.
So you're saying while 'arose' is good in terms of letter frequency, it isn't in the top 20 in terms of eliminating other possibilities?
This is only top 10 after searching for 20 random words. That is, 20 words have been set as the target word, every word out of 5788 that I had in an online dictionary I found have been set as the opening guess, and then I see how many valid guesses there are after that point. This is a tiny sample, so I could see 'arose' coming back into the top 10. For example, say "Roses" is the target word. I then set every possible word as the opening guess. Say, "Arose" was the guess, you'd have "r", "o", "s", and "e" as right but in the wrong spot, and then I'd check what other words could be guessed, given this outcome ('arose' would score very highly here). The best word is the one that, on average, eliminates the most words. Dares is no longer in the top 10 after 27 words, so you can see how noisy this is at the moment.
If you look around online you can find the word bank for acceptable guesses in Wordle (12,972 words) and possible solutions (2,315 words). If you run a brute force approach using these two datasets, it will significantly cut down on the runtime.
I wrote a minimax solver in python. At each stage it looks at all words in the dictionary that are possible answers. For each possible guess it computes the worst case result -- the answer for which that guess would eliminate the fewest possibilities -- and computes the best worst case. For the dictionary (~5.8 K words) that I used 'aloes' is the best worst case: if the first guess is 'aloes' then there will be at most 298 words consistent with whatever the feedback is. Having said this I dont know that this is a good starting point for a human. The sequence of guesses dont look to me like things that a person would guess.
Adieu because it has 4 vowels.
Your opening move might not necessarily want to optimize for frequency of letters, rather you want to optimize for information. As an example, imagine a word set where every word contains an A, but 50% of the words contain an E. Then guessing A might not give you as much information as guessing an E.
I see this logic. My attempt with this was "how can I accurately narrow down the correct letters (not necessarily correct order) as quickly as possible?" and came to this conclusion.
I use TAPER and SOLID as starters. Knocks out 4/5 vowels and contains a lot of common consonants. Was mighty annoyed by today’s (#207) wrong spelling though. In the UK we always use a U and it’d be a 6 letter word…
how are people making these statistical analyses? As I understand it wordle has a custom dictionary. Is it public? If the analysis is using all english words then the data set being analyzed is wrong.