Computers today are really fast. In fact, they’re fast
enough that some people don’t bother to optimize a lot of code. Sometimes
that’s okay – after all, premature optimization isn’t a positive thing and can
see you spending a great deal of time in exchange for little or no gain. Other
times, however, you know there’s going to be a difference of an order of
magnitude or more on a computation intensive part of a system.
As an example, there is a coding challenge on CodeEval to
find the character in a string at a specific location. You aren’t given the
string ahead of time, just the first character and the pattern to use in order
to build the string (up to a string length of 3,000,000,000).
I’ll summarize here since it’s pretty simple. The character
set is 0, 1, and 2. The first character is 0. You build out the string by using
the previous string as the first half of the new string and add one to each
digit. This means that 0 => 01 => 0112 => 01121220 and so on.
You only want one character in the string (we’ll call it N).
The rest, as far as you are concerned, is garbage.
The knee jerk reaction for a lot of people would be to build
the string to the length that you need and just access the character directly.
That might even be okay if your upper bound for length wasn’t 3 BILLION. Given
that, however, you need a better way (I’ll show you why here in a little
while).
If you take a minute to look at it, you’ll realize that, by
following the pattern they’ve defined for the number, that you will always end
up with a string that has a length of a power of 2 (1, 2, 4, 8, 16, and so on).
That means that you can determine how long the shortest string of that pattern
will be by taking x = Ceiling of log2(N) and then calculating 2^x.
That doesn’t help us much by itself, but with that
information, the pattern, and the fact that we only need one character, it
makes this problem a prime candidate for recursion.
Method 1 –
Recursion
Recursion can be kind of a scary word for some people
because initially it can be sort of difficult to wrap your brain around.
However, given a little practice, the hardest parts of the exercise are
structuring the problem in a way that you can use recursion and making sure
that you actually write it properly (in fact, it took me a little while to get
rid of a bug because I had a line of code in the wrong place).
I’ll link to the code here so I don’t have to walk through
everything, but to put it succinctly, after calculating the needed string
length, the recursive approach essentially keeps track of where the target
character would be during each string transformation, uses that to calculate
how many times the character would have to change (C) and then calculates our N
= C % 3 (each character is essentially a one digit base 3 number, so we can get
away with that).
It’s probably as close to doing the problem with pure math
as you can get without actually having an equation. As a result, it’s quick.
REALLY quick. Fast enough, in fact, that I had to measure runtime in ticks
(according to MSDN documentation, it takes 10,000 ticks to make up a
millisecond).
The chart below shows the runtime (in both ticks and
milliseconds) of one calculation for each given string length (2^0 through
2^32) – not counting file I/O or output to screen (which the original project
required). This is just the time from being handed the number to getting the
answer to what N is.
Power
of 2
|
Recursive
Runtime (in Ticks)
|
Recursive
Runtime (in Milliseconds)
|
0
|
4
|
0
|
1
|
10
|
0
|
2
|
4
|
0
|
3
|
4
|
0
|
4
|
5
|
0
|
5
|
5
|
0
|
6
|
5
|
0
|
7
|
4
|
0
|
8
|
5
|
0
|
9
|
5
|
0
|
10
|
4
|
0
|
11
|
5
|
0
|
12
|
4
|
0
|
13
|
5
|
0
|
14
|
5
|
0
|
15
|
5
|
0
|
16
|
5
|
0
|
17
|
5
|
0
|
18
|
5
|
0
|
19
|
5
|
0
|
20
|
5
|
0
|
21
|
5
|
0
|
22
|
6
|
0
|
23
|
6
|
0
|
24
|
6
|
0
|
25
|
6
|
0
|
26
|
6
|
0
|
27
|
32
|
0
|
28
|
6
|
0
|
29
|
6
|
0
|
30
|
6
|
0
|
31
|
7
|
0
|
32
|
7
|
0
|
Note that the 32 tick runtime for the 2^27 length string is
an aberration (probably caused by time slicing in the operating system).
Running it multiple times, every test case ran at 20 ticks or below (generally
10 ticks or less). The only reason it looks like a big difference is because
ticks are really REALLY tiny.
So, to sum up the pros and cons for the recursive approach
to this problem
Pros: Really
fast. Takes up less memory because there’s no actual string being built.
Cons: More
difficult to write and debug
Method 2 –
Iterative (Don’t do it this way. Please)
Again, I’ll give you the code for this method.
This time, we’re actually building the string in order to
find the character. There are a few problems with this.
First is that it’s SLOW. I initially tried doing it two
different ways. The first way ran for over HALF AN HOUR trying to calculate the
2^32 length string before I stopped it. The way which is in the code linked
above runs much faster because it uses String.Replace() but seems to use more
memory.
Second, it takes up A LOT of memory because it’s actually
building the string and keeping it in memory. That’s not a big deal when the
string is small, but according to the formula here, the size of
just the string at its maximum length is 512MB.
Below is the benchmark data for this approach using the same
constraints (no file or screen I/O) and the same numbers as the recursive
approach.
Power
of 2
|
Iterative
Runtime (in Ticks)
|
Iterative
Runtime (in Milliseconds)
|
0
|
4
|
0
|
1
|
5
|
0
|
2
|
4
|
0
|
3
|
6
|
0
|
4
|
7
|
0
|
5
|
9
|
0
|
6
|
21
|
0
|
7
|
17
|
0
|
8
|
35
|
0
|
9
|
56
|
0
|
10
|
111
|
0
|
11
|
281
|
0
|
12
|
386
|
0
|
13
|
2011
|
0
|
14
|
6360
|
3
|
15
|
12657
|
6
|
16
|
12528
|
6
|
17
|
125246
|
61
|
18
|
57912
|
28
|
19
|
53481
|
26
|
20
|
101440
|
49
|
21
|
204253
|
99
|
22
|
400705
|
195
|
23
|
1250113
|
611
|
24
|
1778822
|
869
|
25
|
3158791
|
1544
|
26
|
6980949
|
3412
|
27
|
OME
|
OME
|
28
|
OME
|
OME
|
29
|
OME
|
OME
|
30
|
OME
|
OME
|
31
|
OME
|
OME
|
32
|
NA
|
NA
|
OME stands for Out Of Memory Exception. These test cases
literally caused the program to exceed its allocated memory (it was compiled
for “any processor” since NUnit’s test runner didn’t want to play nicely with
forced x64 solutions).
The NA result at 2^32 reveals a special problem. In C#, a
character at a specific index of a string is accessed by using stringName[int].
The catch is that 2^32 is larger than an Int (you need at least a Long if you
want a number that big), so you’d need to compensate by doing a substring to
chop off the first Int32.MaxValue and reducing the index by the same amount.
I did that in the first attempt at this problem, but since I
found out it was going to throw memory exceptions, I decided that I’d refrain
from re-implementing it for the sake of simplicity and more readable code.
As you can see, this approach is MUCH slower than the
recursive one. In fact, the general trend is to roughly double the runtime for
each larger string. This means that a string of length 2^32 would take roughly 384,000,000
ticks (38,400,000 times the length of time for the recursive solution assuming
the recursive solution took 10 ticks).
This is, of course, assuming you made the string from
scratch for every test case. You could get around that to an extent by storing
the string and just building it out longer as needed. However, you’d still be
eating up a lot of memory to store the string. At its longest, the string would
be over 4 billion characters long.
That’s longer than War And Peace. This is why,
sometimes, optimization really matters.
I’m not sure I really need to sum up the pros and cons for
this approach, but
Pros: Stupidly
easy to write
Cons: Slow and
memory intensive.
That’s not to say that recursion is the right answer to
every problem, because it isn’t. There are times, however, when the extra work
in designing a more efficient algorithm in order to tackle a problem can really
pay off.
Just some food for thought.
Current mood: thoughtful
Current music: Shawn Mullins – Lullaby
No comments:
Post a Comment