Advent of Code - Year 2015, Day 12

Problem statement: http://adventofcode.com/2015/day/12

Part A

Given that the data provided is in canonical JSON format, we could use a JSON parser to extract the numbers and sum them together. However, the JSON parser would convert the string into a hierarchy, and we would then have to write an algorithm to touch each element in the hierarchy to find all the numbers.

Instead, we can use regular expressions to extract all of the numbers directly out of the string, and then sum them together. For Part A, since all we're doing is extracting the numbers, the regex is not complicated. The full regex for extracting numbers is: [,:[](-?\d+).

Regex Explanation
[,:[] The [] syntax allows defining a custom character class that is used to match a group of characters. In this case, we are stating that we want to match a ,, :, or [ at the beginning of the match. This would be the equivalent of writing (,|:|[) using tools already described in this blog. There is no quantifier (*, etc) after the [], so we want to match exactly one of these characters.
(...) In this case, the most important thing for us to match is the number. We don't care about the items preceding the - or the digits, we are just using them as a marker to ensure we don't include any numbers that may be part of an identifier. Wrapping the number in a () allows us to extract the number as a string in the code.
-? The ? quantifier says that we want 0 or 1 of the previous item, a -. When matching numbers, we may or may not see a negative sign prefixing the number, and we need to make sure that we include the negative sign, or we will be adding the wrong numbers together. However, since we may not see the negative sign, we use the ? to specify that it is optional, but include it if you see it.
\d+ We want to match one or more digits (\d). Matches would include 1 or 4326342434.

Putting all this together, we are looking for numbers that are prefixed by a JSON separator. These will then be complete numbers that are not identifiers, and are important for summing. Once we have the numbers, we can convert them from string to integer and add them together to get the final solution.

Part B

This is where things get tricky. Since we're trying to find objects that have "red" as a value in the object, we could give up and go to a JSON parser to find these objects in the hierarchy. That's the easy way out though; we've started with regex, so we're going to finish with Regex!

The way I implemented Part B is to find the entirety of objects that contain "red" as a value and replace them with empty strings. Then the string that remains can be parsed using the same regex for Part A.

So then how do we find a JSON object that contains "red" as a value, but not an object that contains "red" as one of it's descendant's values? Thankfully, the .NET Regex engine allows us to use a technique called "balancing" to ensure that we match balancing pairs of items, specifically { and }.

This means that we can make sure that we match {"qwerty":{"yup":"nope"},"asdf":"red"}, but not {"qwerty":{"yup":"nope","asdf":"red"}}. The first item has balanced the pairs of {} before finding a value of "red", but the second only has opening { before the "red", so it won't be matched at the top level. Instead, the inner object will be matched and removed.

If you want to read the official docs on how Regex does this, please feel free to see here. I'm just going to give you the brief overview.

When we match a {, we do so in a named group, before. This is done using the syntax (?<before>{), which we've used before. Then we look to find any matching } in the named group -before ((?<-before>})). The - tells the engine that any time we match a }, take the last item out of the before group. We are done with this when we no longer have any items in the before group; we specify this by failing the regex if there are any items in the before group: (?(before)(?!)). The (?(before)...) says that we want to match this group if there are any items in the before group. The (?!) is an expression that always fails.

If we surround this group of expressions with [^{}]*, saying that we do not want to match any { or }, then the only way we can match a { or } is with the balancing construct above. This means that we only get a match if the Regex matches a :"red" at the top level of the JSON object.

I encourage you to read the Regex and to send different JSON strings to it to see how it matches the JSON object. In any case, here we use the .Replace() to remove any JSON object that has "red" as a value. Once this is done, the existing code for finding numbers and summing them works again to calculate the sum for Part B.

source