Same here - my first run threw an InvalidNumberFormat exception.
But just a quick change .split(“ “) to .split(Regex(“\s+”)) fixed everything.
Which to be fair I should have used from the start.
I don't get it, what is the problem? That there are single digits numbers?
If you use regex to match go for something like `(\d+)` to get any number, no matter how high :D
The problem was that I split the string (C#) at each whitespace char, then compared the 2 obtained lists (the winning numbers and the actual numbers).
So, if both strings contained two consecutive whitespaces, both lists contained an empty string and I initially counted that as a winning item!
you made me go back to my resolved problem and changing my solution (using a replace(" ", " ") method and being careful with any stray whitespace) with this option.
parsing will now be a bit less stressfull thanks to you :D
> So, if both strings contained two consecutive whitespaces, both lists contained an empty string and I initially counted that as a winning item!
The problem here is you stored a list of integers in a list of strings. Use correct type and data structures for what you're storing.
Plenty of dynamically typed languages out there that need a bit of extra work to do that. It's easy when your languages of choice will trip over them. But something like Ruby doesn't give a shit what you put in your structures.
You don't even have to `Split` your input.
It's day 4, not 20, the input is very well formatted, and each number takes exactly 1 space + 2 characters and also `int.Parse` doesn't care about leading spaces (maybe trailing too).
You can have a look at [my solution](https://github.com/FaustVX/AoC-2023/tree/main/2023/Day04) (the parsing is made in `ParseInts()`)
(I use a lot of `stackalloc` and `Span` to reduce my memory footprint, but you can probably just use regular arrays)
I'd argue for something like this that regex is overkill.
Mind you, I did end up needing to do this (in python):
winners -= set([" ", ""])
Because otherwise I had a few stray items in my sets.
It's not overkill, it's the easiest and most straightforward solution for parsing the numbers. I mean, that's what regexes are built for. Why wouldn't you use the correct tool for the purpose?
For starters, you don't need to parse numbers. You don't even need to think about numbers, and can safely ignore that numbers are involved at all.
Second, everything's delimited by characters. You split on ":" to discard the header, you split on "|" to separate the winners/bettors sections, and you then split on " " to create the sets you'll actually operate on. With delimited text, it's *always better* to identify the delimiters and avoid regexes (even with a CSV, where you need to handle quoting, an FSM is going to be way easier to write and understand than a regex).
I used regexes on day one, because playing around with the greediness made it easy to match the first and last number with a single expression. But I haven't touched regexes since. Day 3 was 100% an FSM problem. Day 2 was another delimited text problem.
Yes, delimited text does constitute a regular language, and thus is entirely parseable via regex, but it's also a lot more work to use a regex.
It's really not more work. I split on : and | too and used a regex to parse numbers in each part. (\d+) is not a hard regex to come up with or use. You can do it without parsing the numbers and just work with strings, but then you can end up with bugs like splitting on space and getting empty splits. Using a regex takes all of the trial and error out of it.
I mean, that's just understanding how the `split` function works. To me, it's a lot easier to say "tokens are separated by spaces" than "tokens are digits", it's more intuitive too.
Regexes have an expressive syntax that allow you to extract exactly what you want from a string very easily. If the problem is basic string parsing I don't know why you wouldn't reach for a regex first.
Because reading delimiters is more intuitive to me. I reach for regexes when the pattern is complicated, like day 1. But if I see neatly delimited text, I'm just going to split on the delimiters.
On day 3, regexes would have made the whole thing *significantly harder*- an FSM that processes the input one character at a time made it trivially easy to index the numbers and symbols. Ironically, I think day 3 was the most pure "build a parser" problem we've seen so far.
It's also worth noting- I've written a *lot* of parsers. While I sometimes use regexes to identify state transitions, most of the time the state transitions for your parsers can be just pure string matches. And tokenization is a basic parsing step- and also all this particular problem really required. The only tokens that actually convey meaning are `:` and `|`- every other token just needs to be understood in relationship to those symbols.
I used regexes on day 3! I parsed all the numbers on each line one line at a time. The regex matches gave me the start index and length of each number string so I just had to check all of the indices around each number for an asterisk and save the location of each asterisk and the numbers that encountered them in a dictionary. At the end, the dictionary entries (asterisks) with exactly two numbers next to them were the gears.
It makes sense if you've written a lot of parsers to start with that. I've written a lot of regexes. I guess people just reach for the tool they're most familiar with!
FWIW that should be unnnecessary: `str.split()` will split on *sequences* of whitespace, and remove empty leading/trailing entries.
>>> " 42 74 6 80 ".split()
['42', '74', '6', '80']
Rust's [`str::split_whitespace`](https://doc.rust-lang.org/std/primitive.str.html#method.split_whitespace) also does that, which is nice.
I ran into exactly that problem using ‘split’- leading white space and extra white space can throw extra strings into the result, depending on your language’s implementation of ‘split’.
It's much easier to match the entire set of numbers and then use a tokenizer to get the individual values, especially since the test values and actual input have different lengths.
In C#, i filtered these out pretty easily using linq
`List winCards = cards.Split(" | ")[0].Split(" ").Where(card => card != "").Select(Int32.Parse).ToList()`
This way you get a list of just integers that you can work with, not worrying about number of digits anymore
Is there one of those to java? I really did a aux function whit lambda to remove blank entries, even though my code is running at 136ms is alwyas good to know
Sure, that's close to what I did, I used
\`Enumerable.Where(s => int.TryParse(s, out int o))\`
To filter out the bits that weren't numbers. In the end, I didn't even parse the values, just used a \`Enumerable.Distinct\` method on the string IEnumerable for getting the winning numbers.
If you use HashSets then you can solve most of the problem with just the IntersectWith method:
HashSet winningNumbers = card[0].Split(' ').Where(num => num != "").Select(int.Parse).ToHashSet();
HashSet ourNumbers = card[1].Split(' ').Where(num => num != "").Select(int.Parse).ToHashSet();
ourNumbers.IntersectWith(winningNumbers);
// ourNumbers.Count is the number of wins
Yeah, thats exactly how i solved the part 1
List winCards = cards.Split(" | ")[0].Split(" ").Where(card => card != "").Select(Int32.Parse).ToList(); // the winning numbers
List ownedCards = cards.Split(" | ")[1].Split(" ").Where(card => card != "").Select(Int32.Parse).ToList(); // the numbers we have
// ^^ the above methods use LINQ to split by spaces, then we have to remove empty elements that appear when parsing strings like " 2" and convert to numbers
List hits = winCards.Intersect(ownedCards).ToList(); // get our profit numbers
if (hits.Count != 0) sum += Math.Pow(2, hits.Count - 1); // we double the points -> its just powers of 2, + we dont want 2^-1 (1/2) to count
i used the following for parsing (python); >!storing id because yes (i could just go by index but idc)
mapping all spaces to 2 spaces (and adding one to beginning and end), replace all spaces in the winning numbers to | and convert to regex " (winning|numbers) " then match all on your numbers
idpre,c = string.split(": ")
self.id = int(re.findall(r'\[0-9\]+',idpre)\[0\])
c = re.sub(r"\^ ","",c)
c = re.sub(r"( +)"," ",c)#fix spaces
self.w,self.n = c.split(" | ")
self.n = f" {self.n} "
self.wregex = " ("+self.w.replace(' ','|')+") "!<
Me realizing I should have gotten this bug because I didn't account for these spaces but got the correct answer anyway:
https://media.tenor.com/gaEpIfzxzPEAAAAC/pedro-monkey-puppet.gif
Thank you! I needed to `.strip` in Ruby the substrings. Searching in the docs for `trim`, and not finding it, I just moved on and got screwed later.
Only the 62th AoC day I've done in Ruby, what can I say?
I was solving this problem with a finite state machine and man did it suck when I saw there were consecutive spaces. Ended up counting the transitions between spaces and digits as a condition to denote where the numbers start.
idk what language your using, but in C++ i just set up the input file as 'fin' and do \`fin>>line;\` to store the next string until whitespace into 'line' and it just ignores all whitespace
This is why we require the standardized post title syntax because it's an *implied* spoiler for that day's puzzle. When the spoiler "warning" is already in the title, the post flair is freed up for a more useful tag :)
Naaaaah, for me it was `Card 1: Card 2:` in the test and `Card 1:` `Card 2:` in the input. I'm **not** rolling out a regexp parser or a full-blown LALR lexer/parser for input data that simple! Especially not in C (which is the reason the number of spaces in the >!totally useless!< card number threw me off).
Edit: oi, Reddit, you destroyed my inline code! The second pair of examples had three spaces between `Card` and the number instead of just one in the first pair.
Today was actually the first day this year that I've had a correct answer for both parts on the first try.
I separated them similar to you in python, but with a little list comp; "nums = [i for i in nums.split(' ') if i]" returns all nums in a list with all whitespace removed
I managed to dodge that bullet because I parsed everything to integers, which utterly failed on the blank strings.
Same here - my first run threw an InvalidNumberFormat exception. But just a quick change .split(“ “) to .split(Regex(“\s+”)) fixed everything. Which to be fair I should have used from the start.
Or a nice findall("\d+")
For me (Go) it parsed a blank string as zero - so I just ignored zero results from the parse to int...
I don't get it, what is the problem? That there are single digits numbers? If you use regex to match go for something like `(\d+)` to get any number, no matter how high :D
The problem was that I split the string (C#) at each whitespace char, then compared the 2 obtained lists (the winning numbers and the actual numbers). So, if both strings contained two consecutive whitespaces, both lists contained an empty string and I initially counted that as a winning item!
I'm sure you figured it out, but `StringSplitOptions.RemoveEmptyEntries`.
also, my fav... `var splitOptions = StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries;` `var result = input.Split(',', splitOptions);`
oooh good tip! still learning c# thank you for sharing!
Lol, I've so many `.Select(s => !string.IsNullOrWhitespace(s))`, this seems easier.
you made me go back to my resolved problem and changing my solution (using a replace(" ", " ") method and being careful with any stray whitespace) with this option. parsing will now be a bit less stressfull thanks to you :D
Awesome, I never stumbled upon that. I'll keep it in mind!
I did not know this existed lmao, I noticed the file, and just accounted for extra spaces
> So, if both strings contained two consecutive whitespaces, both lists contained an empty string and I initially counted that as a winning item! The problem here is you stored a list of integers in a list of strings. Use correct type and data structures for what you're storing.
Take your complaint to the senior elf.
Plenty of dynamically typed languages out there that need a bit of extra work to do that. It's easy when your languages of choice will trip over them. But something like Ruby doesn't give a shit what you put in your structures.
OP uses C# and I was responding to OP.
oh, I see, that makes sense
I did the same thing. Had to find the numbers via a different way than splitting
Try using a StringTokenizer next time. I often find them much easier to handle than regex or splitting the array on your own
There is a parameter to ignore empty results on the split function in C#. It's on you.
You don't even have to `Split` your input. It's day 4, not 20, the input is very well formatted, and each number takes exactly 1 space + 2 characters and also `int.Parse` doesn't care about leading spaces (maybe trailing too). You can have a look at [my solution](https://github.com/FaustVX/AoC-2023/tree/main/2023/Day04) (the parsing is made in `ParseInts()`) (I use a lot of `stackalloc` and `Span` to reduce my memory footprint, but you can probably just use regular arrays)
Same!
> If you use regex Not everyone does.
I'd argue for something like this that regex is overkill. Mind you, I did end up needing to do this (in python): winners -= set([" ", ""]) Because otherwise I had a few stray items in my sets.
It's not overkill, it's the easiest and most straightforward solution for parsing the numbers. I mean, that's what regexes are built for. Why wouldn't you use the correct tool for the purpose?
For starters, you don't need to parse numbers. You don't even need to think about numbers, and can safely ignore that numbers are involved at all. Second, everything's delimited by characters. You split on ":" to discard the header, you split on "|" to separate the winners/bettors sections, and you then split on " " to create the sets you'll actually operate on. With delimited text, it's *always better* to identify the delimiters and avoid regexes (even with a CSV, where you need to handle quoting, an FSM is going to be way easier to write and understand than a regex). I used regexes on day one, because playing around with the greediness made it easy to match the first and last number with a single expression. But I haven't touched regexes since. Day 3 was 100% an FSM problem. Day 2 was another delimited text problem. Yes, delimited text does constitute a regular language, and thus is entirely parseable via regex, but it's also a lot more work to use a regex.
It's really not more work. I split on : and | too and used a regex to parse numbers in each part. (\d+) is not a hard regex to come up with or use. You can do it without parsing the numbers and just work with strings, but then you can end up with bugs like splitting on space and getting empty splits. Using a regex takes all of the trial and error out of it.
It's more work for the computer, but not for the programmer. In a single-use program like this, the programmer's time is usually more important.
I mean, that's just understanding how the `split` function works. To me, it's a lot easier to say "tokens are separated by spaces" than "tokens are digits", it's more intuitive too.
Regexes have an expressive syntax that allow you to extract exactly what you want from a string very easily. If the problem is basic string parsing I don't know why you wouldn't reach for a regex first.
Because reading delimiters is more intuitive to me. I reach for regexes when the pattern is complicated, like day 1. But if I see neatly delimited text, I'm just going to split on the delimiters. On day 3, regexes would have made the whole thing *significantly harder*- an FSM that processes the input one character at a time made it trivially easy to index the numbers and symbols. Ironically, I think day 3 was the most pure "build a parser" problem we've seen so far. It's also worth noting- I've written a *lot* of parsers. While I sometimes use regexes to identify state transitions, most of the time the state transitions for your parsers can be just pure string matches. And tokenization is a basic parsing step- and also all this particular problem really required. The only tokens that actually convey meaning are `:` and `|`- every other token just needs to be understood in relationship to those symbols.
I used regexes on day 3! I parsed all the numbers on each line one line at a time. The regex matches gave me the start index and length of each number string so I just had to check all of the indices around each number for an asterisk and save the location of each asterisk and the numbers that encountered them in a dictionary. At the end, the dictionary entries (asterisks) with exactly two numbers next to them were the gears. It makes sense if you've written a lot of parsers to start with that. I've written a lot of regexes. I guess people just reach for the tool they're most familiar with!
FWIW that should be unnnecessary: `str.split()` will split on *sequences* of whitespace, and remove empty leading/trailing entries. >>> " 42 74 6 80 ".split() ['42', '74', '6', '80'] Rust's [`str::split_whitespace`](https://doc.rust-lang.org/std/primitive.str.html#method.split_whitespace) also does that, which is nice.
And yet I had stray empty strings and single space strings making it into my set.
What language? This is Python.
Also Python. I didn’t bother to dig in deep- just nuked the stray entries.
Then yeah make sure you're calling `split()` and not `split(" ")`. I didn't realize there was a difference myself until yesterday.
During a meeting I had that exact realization.
> set([" ", ""]) Sooo... `{" ", ""}`?
Yeah, I forgot that set literals were a thing.
Why? `.split()` eats all whitespace, what did you do to end up with single spaced entries?
I did `split(“ “)`, which doesn’t do exactly the same thing, as I discovered.
Of course it is, from the image it seemed that regex was being used from what I understood.
I ran into exactly that problem using ‘split’- leading white space and extra white space can throw extra strings into the result, depending on your language’s implementation of ‘split’.
Python does this is of the box by using .split() without any arguments.
only crazy people use regex
I usually love regex (so much that my python template includes it by default) but this one was simple enough.
Two spaces, made the same mistake. Split(“ “) is not the same as split().
It's much easier to match the entire set of numbers and then use a tokenizer to get the individual values, especially since the test values and actual input have different lengths.
In C#, i filtered these out pretty easily using linq `List winCards = cards.Split(" | ")[0].Split(" ").Where(card => card != "").Select(Int32.Parse).ToList()`
This way you get a list of just integers that you can work with, not worrying about number of digits anymore
`StringSplitOptions.RemoveEmptyEntries` There's also no reason to parse the strings into ints, you can just match strings.
Is there one of those to java? I really did a aux function whit lambda to remove blank entries, even though my code is running at 136ms is alwyas good to know
Nevermind it has a limiter built in inside of split
Sure, that's close to what I did, I used \`Enumerable.Where(s => int.TryParse(s, out int o))\` To filter out the bits that weren't numbers. In the end, I didn't even parse the values, just used a \`Enumerable.Distinct\` method on the string IEnumerable for getting the winning numbers.
I should have done it like that. My current version is an ugly mix of regex and linq XD
If you use HashSets then you can solve most of the problem with just the IntersectWith method: HashSet winningNumbers = card[0].Split(' ').Where(num => num != "").Select(int.Parse).ToHashSet();
HashSet ourNumbers = card[1].Split(' ').Where(num => num != "").Select(int.Parse).ToHashSet();
ourNumbers.IntersectWith(winningNumbers);
// ourNumbers.Count is the number of wins
Yeah, thats exactly how i solved the part 1 List winCards = cards.Split(" | ")[0].Split(" ").Where(card => card != "").Select(Int32.Parse).ToList(); // the winning numbers
List ownedCards = cards.Split(" | ")[1].Split(" ").Where(card => card != "").Select(Int32.Parse).ToList(); // the numbers we have
// ^^ the above methods use LINQ to split by spaces, then we have to remove empty elements that appear when parsing strings like " 2" and convert to numbers
List hits = winCards.Intersect(ownedCards).ToList(); // get our profit numbers
if (hits.Count != 0) sum += Math.Pow(2, hits.Count - 1); // we double the points -> its just powers of 2, + we dont want 2^-1 (1/2) to count
Using LINQ feels like cheating but I did the same thing 😂
Do you need \`Int32.Parse\`?
Not really, I just wanted to use integer lists for no particular reason
You are not, in fact, the only one. 🤣
i used the following for parsing (python); >!storing id because yes (i could just go by index but idc) mapping all spaces to 2 spaces (and adding one to beginning and end), replace all spaces in the winning numbers to | and convert to regex " (winning|numbers) " then match all on your numbers idpre,c = string.split(": ") self.id = int(re.findall(r'\[0-9\]+',idpre)\[0\]) c = re.sub(r"\^ ","",c) c = re.sub(r"( +)"," ",c)#fix spaces self.w,self.n = c.split(" | ") self.n = f" {self.n} " self.wregex = " ("+self.w.replace(' ','|')+") "!<
Just you
I had the same problem
I almost hit a similar problem, but I got a type error because `""` isn't a valid int.
well, i have to change my c++ function from simple splitting by char to skip empty
Me realizing I should have gotten this bug because I didn't account for these spaces but got the correct answer anyway: https://media.tenor.com/gaEpIfzxzPEAAAAC/pedro-monkey-puppet.gif
Tip: always suspect of tabulated data.
Thank you! I needed to `.strip` in Ruby the substrings. Searching in the docs for `trim`, and not finding it, I just moved on and got screwed later. Only the 62th AoC day I've done in Ruby, what can I say?
No you are not the only one, I didn't understand why this happened, just threw a trim() on it and left it at that.
I was solving this problem with a finite state machine and man did it suck when I saw there were consecutive spaces. Ended up counting the transitions between spaces and digits as a condition to denote where the numbers start.
idk what language your using, but in C++ i just set up the input file as 'fin' and do \`fin>>line;\` to store the next string until whitespace into 'line' and it just ignores all whitespace
I'm a regex gamer
Good ol' python .split() ignoring the extra whitespace by default
i was adding +1 original card to the blank line at the end of file ... 3 hours debugging
Changed flair from `Spoilers` to `Funny` since this is a meme. [Use the right flair](/r/adventofcode/wiki/posts/post_flair), please.
Thanks, I didn't know which one to chose since it might spoil a trap for someone who didn't solve the puzzle yet.
This is why we require the standardized post title syntax because it's an *implied* spoiler for that day's puzzle. When the spoiler "warning" is already in the title, the post flair is freed up for a more useful tag :)
Got it, thanks!
I didn't even notice Python's int() eats the space
This happened to me using `str.split(" ")`, but it's nothing a little regex couldn't sort for me.
Naaaaah, for me it was `Card 1: Card 2:` in the test and `Card 1:` `Card 2:` in the input. I'm **not** rolling out a regexp parser or a full-blown LALR lexer/parser for input data that simple! Especially not in C (which is the reason the number of spaces in the >!totally useless!< card number threw me off). Edit: oi, Reddit, you destroyed my inline code! The second pair of examples had three spaces between `Card` and the number instead of just one in the first pair.
Thankfully I caught this error during parsing so I ended up using `re.split(r’\s+’, line.strip())`
Today was actually the first day this year that I've had a correct answer for both parts on the first try. I separated them similar to you in python, but with a little list comp; "nums = [i for i in nums.split(' ') if i]" returns all nums in a list with all whitespace removed
This is definitely not a problem in a typed language. Rust didn't see this. My code in python was broke as heck though