Advent of Code - Year 2015, Day 08

Problem statement: http://adventofcode.com/2015/day/8

Part A

String escaping is a common problem, has a variety of solutions. Situational awareness can be key to developing the right method of dealing with string escaping.

For example, the easiest way to decode string escaping in C# is to call string.Replace('\\\\', '\\'), or similar for " and other special characters. However, this will require a full pass over the string for each character you wish to replace.

Alternatively, you can set up a character by character replacement:

var sb = new StringBuilder()  
for (int i = 0; i < str.Length; i++))  
    if (i == '\\') { i++; sb.Add(str[i]); }
    else { sb.Add(str[i]); }

You could do the same thing as an IEnumerable<char> to reduce overall memory usage for longer strings.

The complete way to do string escaping would be to set up a state-machine, where you evaluate character by character what you are going to do based on a current state, i.e. whether you are currently after an escape character, etc.

In the case of this problem statement, the actual decoded string is not relevant. All that it is asking for is the difference between the original string and the decoded string. As such, we don't need to worry about keeping track of the decoded string, and only need to make sure we know how many characters to skip after each backslash. We can do this with a fairly basic state machine, which will be shown in the F# solution.

Because we don't need to transform the string, instead of trying to decode the string, we can also find a way to simply count the number of distinct "units". We already know of a good way to identify character units: regular expressions. The C# solution demonstrates this way of solving the problem.

Part B

Again, a simplistic answer would actually generate the encoded string. However, with a little intellect, we can generate a cheap algorithm to calculate the variance between the normal string and the encoded one. First, the encoded string will have two extra ", so we start with a variance of 2. Then, recognize that the only characters that are different between a regular string and an encoded string are \ and ", and each of these adds exactly 1 to the variance, because they will transform from \ to \\ and " to \" (2 - 1 = 1).

The final variance, then, is the count of all \ and " in the string, plus 2. This is easily coded in both the F# and C# solutions.

F#

source
I am really beginning to appreciate the match statement for it's terse syntax. I am able to describe the entire state machine for the string decoder in 18 lines, which is remarkably short for a moderately complex state machine. Even a basic one using switch statements will take 30 lines or more in C#, between the { and the break;, and adding spaces between cases for clean looking code.

The bulk of the work is done in the getDecodedLength function. It takes in a string and returns the length of it's decoded equivalent. You may notice that we are not building or creating the decoded string itself. As discussed before, Seq.fold operates on an enumeration, but carries a State value from item to item. In this case, we are defining State to be (DecodeState, int), which allows us to carry both the current context (DecodeState) and the current length (int).

For each character in the string, we first identify which context we are in. If we are in InitialState, then we are expecting that we will see a " as the first character, otherwise it is a failure. Thus we then set the current context to Normal and keep a count of 0. Similarly, for Normal, we check that we can find a \, in which case we go to the Escaped state, which would allow us to evaluate which kind of escape we have and how many characters we need to process. Finding a " will indicate that we are at the end of the string, indicated by End. Otherwise, we remain in the Normal state and increment the count.

Once we are done with Seq.fold, we have a Tuple, namely (End, <count>). Now, we don't care about the state, we know what it is and we aren't going to use it. The syntax let (_, cnt) allows us to specify that we don't care about the first object in the Tuple, and to assign the count to the cnt variable. Then we can return it.

To get the total decoded length, we simply need to map each line of the input to getDecodedLength, which gives us an enumeration of ints, which we can then sum.

getEncodedLength is the F# coding of the algorithm described in Part B above.

C#

source
While a state machine is useful for encoding and decoding all sorts of data, in this case, we don't care about actually decoding the data. We just need to know how many characters there are in the decoded string. If you've read any of my other posts on Regex, you'll have figured out that regex is really good at identifying blocks of characters. So, we can make the Regex engine take care of finding each "character" unit for us.

Regex

@"""(?<char>\\x.{2}|\\\\|\\\""|\w)*""" (the @ is critical for using this Regex; otherwise it would be twice as long of a string to input to C#.

Disregarding the @" to open and " to close, you'll notice "" on each end of the string. Under a "verbatim" string (@), "" is required to specify a " in the middle of the string. This will tell regex to expect a " at the beginning and end of each string. I have explained how (?<char>...) operates: it defines a group that gets extracted into the group named char. Similarly, I have described that * specifies that we want 0 or more instances of the expression inside the (). This means that we are allowed to match "" as a string, because there are 0 units in this string.

The clever part comes in specifying that each unit matches one of three options: \x.{2}, \\, \", or any identifier character (\w). The extra \ in the string tell the regex that we want to match literal \, and not just use it as a modifier as in the case of \w. So, to indicate a literal \\, we need to escape the \ twice, giving \\\\.

So, now that we have told the engine that we want to match a string that contains some combination of these four options, we need to actually count them. Thankfully, the .NET Regex engine provides us a way to do that. (Fair warning: not all regex engines provide the following information.) Once we have collected a series of units into the Group char, the engine provides us a .Captures property. This property exposes each distinct time that the char group was captured. Since we now have a list of each of these captures, we can then simply Count how many items are on the list, and we know how long the decoded string is.

The rest of the code for the C# version should be straight forward: we use the GetDecodedLength() function to map each line to a decoded length and sum it, just as we do in the F# version. Similarly, we use the algorithm described for Part B to calculate the encoded length.