I can’t link to part two of today’s problem because you have to solve part 1 to unlock the second half of the web page; but I’m sure it’s been countlessly reposted by now and I’ll summarize it:
Oops, some of the digits that we want to add up have been written as English words instead of numerals. Recognize the words one
, two
, three
, four
, five
, six
, seven
, eight
, and nine
as digits also. And we have new sample input:
two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
Which yields 29 + 83 + 13 + 14 + 42 + 14 + 76 = 281.
So … we need to convert number-words into digits. But:
- Not zero.
- Only need to convert the first and last number-words of a line, not every occurrence.
- Only need to convert number words that are closer to the start or end of a line than any digits.
- Not specifically stated in the problem: Watch out for lines ending in things like
oneight
.
Tricksy hobbitses!
So I cloned my part 1 solution and added to it:
my %value = (
one => 1, two => 2, three => 3, four => 4, five => 5,
six => 6, seven => 7, eight => 8, nine => 9,
);
my $wordre = join("|", keys %value);
I create an associative array / hash that will let me look up the value of every number-word that this problem asks us to replace. I then extract the keys of that hash — the number-words — and join them into a list separated by |
pipe (regular expression or) characters. Note that the keys are extracted in arbitrary order and I don’t sort them because I don’t actually care about the order.
This gives me a string like three|one|four|five|nine|...
that has all of the number-words listed as alternatives.
Then in the loop to process each line, we need to do some conversions:
# Substitute the first number word if no digits before it.
s/^([^\d]*?)($wordre)/$1$value{$2}/;
If a line starts ^
with things that aren’t digits [^\d]
, zero or more of them *
, and is then followed by ($wordre)
a number-word, preserve everything that was before the number-word and replace the number-word with the value of that word — the digit.
Note that for this problem, I don’t actually need the text $1
that was before the number-word so I don’t have to preserve that; but it’s habit. If this were time-sensitive (computationally-heavy or big-data input), I’d be more careful about extracting only needed information with minimal manipulation.
Note also that the ?
operator after the *
says to match the fewest non-digit characters possible. Without that, a line like onetwothreeboohoohoo7
would be converted to onetwo3boohoohoo7
(because one
and two
are non-digits that we’re accepting as many of as possible) instead of the correct 1twothreeboohoohoo7
. A normal *
is called a “greedy” operator, eating as much of the string as possible; adding ?
after it makes it non-greedy, eating as little of the string as possible.
# Substitute the last number word if no digits after it.
s/(.*)($wordre)([^\d]*)$/$1$value{$2}$3/;
Here, if we eat up as much of the line as possible (.*)
and remember it for later; and then match a number-word ($wordre)
and remember it for later; and that’s followed only by things that aren’t digits ([^\d]*)
(that we’ll remember for later) and then the end of the line $
; then rebuild the line, substituting the value of the number in place of the word.
There should be a cleaner way to write that regular expression than starting with .*
to eat as much as possible; but it is not simply to make the final [^\d]*
non-greedy (that gives a different output) and I’m lazy enough to have stopped with a correct output and a regular expression that I understand how it gives me the correct output.
With these conversions done, the remainder of my part 1 program does the remainder of the work.
Note that if I were writing part 2 without already having written part 1, I’d be much more likely to write something that extracts the first digit or number-word from the string and the last digit or number-word from the string. But when you already have tested code that extracts the first and last digits from a string, it’s super-convenient to convert number-words to digits in place and then reuse the existing extraction code. Again, subject to performance constraints.
The Whole Program
#!/usr/bin/perl
use warnings;
use strict;
my %value = (
one => 1, two => 2, three => 3, four => 4, five => 5,
six => 6, seven => 7, eight => 8, nine => 9,
);
my $wordre = join("|", keys %value);
my $sum;
while (<>) {
# Substitute the first number word if no digits before it.
s/^([^\d]*?)($wordre)/$1$value{$2}/;
# Substitute the last number word if no digits after it.
s/(.*)($wordre)([^\d]*)$/$1$value{$2}$3/;
# Extract first and last digits (if separate) and add to running total.
/^[^\d]*(\d).*(\d)[^\d]*$/ and $sum += 10 * $1 + $2;
# Extract lone digit and add value to running total.
/^[^\d]*(\d)[^\d]*$/ and $sum += 10 * $1 + $1;
}
print "sum is $sum\n";