Code Poet, Swordsman, Eternal Wanderer
Please feel free to comment
Email: james AT jameshollingshead DOT com
Friday, October 31, 2014
Tuesday, October 14, 2014
Thursday, May 01, 2014
Sometimes Optimization Matters (Recursion vs Iterative OR "A 4 Billion Character String is REALLY Long")
Computers today are really fast. In fact, they’re fast
enough that some people don’t bother to optimize a lot of code. Sometimes
that’s okay – after all, premature optimization isn’t a positive thing and can
see you spending a great deal of time in exchange for little or no gain. Other
times, however, you know there’s going to be a difference of an order of
magnitude or more on a computation intensive part of a system.
As an example, there is a coding challenge on CodeEval to
find the character in a string at a specific location. You aren’t given the
string ahead of time, just the first character and the pattern to use in order
to build the string (up to a string length of 3,000,000,000).
I’ll summarize here since it’s pretty simple. The character
set is 0, 1, and 2. The first character is 0. You build out the string by using
the previous string as the first half of the new string and add one to each
digit. This means that 0 => 01 => 0112 => 01121220 and so on.
You only want one character in the string (we’ll call it N).
The rest, as far as you are concerned, is garbage.
The knee jerk reaction for a lot of people would be to build
the string to the length that you need and just access the character directly.
That might even be okay if your upper bound for length wasn’t 3 BILLION. Given
that, however, you need a better way (I’ll show you why here in a little
while).
If you take a minute to look at it, you’ll realize that, by
following the pattern they’ve defined for the number, that you will always end
up with a string that has a length of a power of 2 (1, 2, 4, 8, 16, and so on).
That means that you can determine how long the shortest string of that pattern
will be by taking x = Ceiling of log2(N) and then calculating 2^x.
That doesn’t help us much by itself, but with that
information, the pattern, and the fact that we only need one character, it
makes this problem a prime candidate for recursion.
Method 1 –
Recursion
Recursion can be kind of a scary word for some people
because initially it can be sort of difficult to wrap your brain around.
However, given a little practice, the hardest parts of the exercise are
structuring the problem in a way that you can use recursion and making sure
that you actually write it properly (in fact, it took me a little while to get
rid of a bug because I had a line of code in the wrong place).
I’ll link to the code here so I don’t have to walk through
everything, but to put it succinctly, after calculating the needed string
length, the recursive approach essentially keeps track of where the target
character would be during each string transformation, uses that to calculate
how many times the character would have to change (C) and then calculates our N
= C % 3 (each character is essentially a one digit base 3 number, so we can get
away with that).
It’s probably as close to doing the problem with pure math
as you can get without actually having an equation. As a result, it’s quick.
REALLY quick. Fast enough, in fact, that I had to measure runtime in ticks
(according to MSDN documentation, it takes 10,000 ticks to make up a
millisecond).
The chart below shows the runtime (in both ticks and
milliseconds) of one calculation for each given string length (2^0 through
2^32) – not counting file I/O or output to screen (which the original project
required). This is just the time from being handed the number to getting the
answer to what N is.
Power
of 2
|
Recursive
Runtime (in Ticks)
|
Recursive
Runtime (in Milliseconds)
|
0
|
4
|
0
|
1
|
10
|
0
|
2
|
4
|
0
|
3
|
4
|
0
|
4
|
5
|
0
|
5
|
5
|
0
|
6
|
5
|
0
|
7
|
4
|
0
|
8
|
5
|
0
|
9
|
5
|
0
|
10
|
4
|
0
|
11
|
5
|
0
|
12
|
4
|
0
|
13
|
5
|
0
|
14
|
5
|
0
|
15
|
5
|
0
|
16
|
5
|
0
|
17
|
5
|
0
|
18
|
5
|
0
|
19
|
5
|
0
|
20
|
5
|
0
|
21
|
5
|
0
|
22
|
6
|
0
|
23
|
6
|
0
|
24
|
6
|
0
|
25
|
6
|
0
|
26
|
6
|
0
|
27
|
32
|
0
|
28
|
6
|
0
|
29
|
6
|
0
|
30
|
6
|
0
|
31
|
7
|
0
|
32
|
7
|
0
|
Note that the 32 tick runtime for the 2^27 length string is
an aberration (probably caused by time slicing in the operating system).
Running it multiple times, every test case ran at 20 ticks or below (generally
10 ticks or less). The only reason it looks like a big difference is because
ticks are really REALLY tiny.
So, to sum up the pros and cons for the recursive approach
to this problem
Pros: Really
fast. Takes up less memory because there’s no actual string being built.
Cons: More
difficult to write and debug
Method 2 –
Iterative (Don’t do it this way. Please)
Again, I’ll give you the code for this method.
This time, we’re actually building the string in order to
find the character. There are a few problems with this.
First is that it’s SLOW. I initially tried doing it two
different ways. The first way ran for over HALF AN HOUR trying to calculate the
2^32 length string before I stopped it. The way which is in the code linked
above runs much faster because it uses String.Replace() but seems to use more
memory.
Second, it takes up A LOT of memory because it’s actually
building the string and keeping it in memory. That’s not a big deal when the
string is small, but according to the formula here, the size of
just the string at its maximum length is 512MB.
Below is the benchmark data for this approach using the same
constraints (no file or screen I/O) and the same numbers as the recursive
approach.
Power
of 2
|
Iterative
Runtime (in Ticks)
|
Iterative
Runtime (in Milliseconds)
|
0
|
4
|
0
|
1
|
5
|
0
|
2
|
4
|
0
|
3
|
6
|
0
|
4
|
7
|
0
|
5
|
9
|
0
|
6
|
21
|
0
|
7
|
17
|
0
|
8
|
35
|
0
|
9
|
56
|
0
|
10
|
111
|
0
|
11
|
281
|
0
|
12
|
386
|
0
|
13
|
2011
|
0
|
14
|
6360
|
3
|
15
|
12657
|
6
|
16
|
12528
|
6
|
17
|
125246
|
61
|
18
|
57912
|
28
|
19
|
53481
|
26
|
20
|
101440
|
49
|
21
|
204253
|
99
|
22
|
400705
|
195
|
23
|
1250113
|
611
|
24
|
1778822
|
869
|
25
|
3158791
|
1544
|
26
|
6980949
|
3412
|
27
|
OME
|
OME
|
28
|
OME
|
OME
|
29
|
OME
|
OME
|
30
|
OME
|
OME
|
31
|
OME
|
OME
|
32
|
NA
|
NA
|
OME stands for Out Of Memory Exception. These test cases
literally caused the program to exceed its allocated memory (it was compiled
for “any processor” since NUnit’s test runner didn’t want to play nicely with
forced x64 solutions).
The NA result at 2^32 reveals a special problem. In C#, a
character at a specific index of a string is accessed by using stringName[int].
The catch is that 2^32 is larger than an Int (you need at least a Long if you
want a number that big), so you’d need to compensate by doing a substring to
chop off the first Int32.MaxValue and reducing the index by the same amount.
I did that in the first attempt at this problem, but since I
found out it was going to throw memory exceptions, I decided that I’d refrain
from re-implementing it for the sake of simplicity and more readable code.
As you can see, this approach is MUCH slower than the
recursive one. In fact, the general trend is to roughly double the runtime for
each larger string. This means that a string of length 2^32 would take roughly 384,000,000
ticks (38,400,000 times the length of time for the recursive solution assuming
the recursive solution took 10 ticks).
This is, of course, assuming you made the string from
scratch for every test case. You could get around that to an extent by storing
the string and just building it out longer as needed. However, you’d still be
eating up a lot of memory to store the string. At its longest, the string would
be over 4 billion characters long.
That’s longer than War And Peace. This is why,
sometimes, optimization really matters.
I’m not sure I really need to sum up the pros and cons for
this approach, but
Pros: Stupidly
easy to write
Cons: Slow and
memory intensive.
That’s not to say that recursion is the right answer to
every problem, because it isn’t. There are times, however, when the extra work
in designing a more efficient algorithm in order to tackle a problem can really
pay off.
Just some food for thought.
Current mood: thoughtful
Current music: Shawn Mullins – Lullaby
Wednesday, April 23, 2014
The Evils of Perl (or Logic Should Make Sense)
The people who know me professionally (even a little) have
probably picked up on the fact that I hate Perl. I spent several years as a
sysadmin, so I come by my Perl aversion honestly.
Don’t get me wrong, it’s a useful language and I found it to
be a very handy tool, but it’s so very easy to make code that even the person
who wrote it isn’t sure what it does. The next day.
Or it could be that I’m a crappy developer. Either’s
possible. Maybe a little from Column A and a little from Column B.
It’s the only language that I’ve seen where your cat can
walk across the keyboard and come up with a valid program. What does it do? We
may never know. It may parse a file for a list of addresses, trigger an ICBM
launch, or create animated gifs of dancing cats, but by gum it’ll run.
Personally, I’m hoping for the dancing cat gifs. Everyone
loves dancing kitties.
My standard quip for why I finally stopped coding Perl is
that in any sane language, to get the length of an array, you do something
along the lines of myArray.length. That may not be exactly how it’s done, but
it will be something similar.
In Perl, you do the following: $myArray = @myArray
You assign the array variable to a scalar variable (single piece of information. Int, char,
etc.). This to me makes zero sense from a logical standpoint.
I made that comment about a week or so ago, and the person I
was talking to said that it makes perfect sense because it’s an implicit type
conversion. (I’m not sure if he was joking or not. I hope, and think, that he
was).
He had a sort of point about implicit type conversion, but that
sort of thing is supposed to follow a logical convention of some kind. For
example, an implicit conversion from int to long makes sense (it’s basically
just shoving existing data into a bigger container).
You could even make the case for implicit conversion between
string and character array data types. I wouldn’t do it, because it’s probably
better done explicitly so you don’t shoot yourself in the foot, but the
argument could be made.
However, using implicit conversion to cast an array to (effectively)
an integer in order to get the length of the array doesn’t make any logical
sense. It’s like buying a chicken because you want a renewable source of
pineapples.
In what non-Euclidian universe does this make sense?
“Well, I just need to figure out how long this array is, so
I’ll assign it to a scalar and HOLY CRAP! THERE’S CTHULHU!”
Ia! Ia! ArrayLength! Fhtagn!
Any language that will summon Cthulhu by simply using it is
probably a bad choice…
Now, if you don’t mind, I’ve got to Great Old One proof my
office.
Current mood: Amused
Current music: Charlie Daniels Band – The Devil Went Down to
Georgia
Tuesday, April 08, 2014
In The Beginning
In the beginning, there was a television. To the television
was attached a Tandy TRS-80 CoCo. Then he said, let there be code. And there
was code, and it was in BASIC. He was a child and found the fact that he could
make a computer do things to be really cool.
As a teenager, the internet was laid open to him. HTML was
discovered and there were many horrible mistakes made, including the dreaded
animated gifs of flaming skulls and spinning things. The land was known as
GeoCities, and it was eventually cast into perdition for its sins.
Then was Java made known unto him, and, though he tried, he
found it confusing. This was the first age of Java, so he was not alone in
this.
Lo then did he graduate high school and make his way to
college in an effort to learn to do this shit properly.
There did he learn of many things – Pascal and Prolog, C and
C++, Scheme and Assembly, Socket programming, the creation of data packets, and
stack traces among many other skills and pieces of arcane lore.
He paid the campus ferryman with his toils as a sysadmin
where he applied logic to a place of chaos known as a non-profit. All the
while, using his position as an excuse to do things programmatically so as not
to have to run around the hilltop known as The Ridges as often.
Then did he return to Java, and he found it…okay. After all,
with a background in object oriented programming, data structures, and the
like, Java now made much more sense.
Upon his path, however, he decided that Java was not the
right tool for him, so he endeavored to learn C#. Additionally, in order to
wipe away, or at least atone for, his past sins of using Perl, he also learned
Ruby to an extent that would allow himself to get into (and hopefully out of)
trouble on occasion.
He walked the paths of desktop development and eventually
began to work in order to master the arts of web development as well using the
MVC framework. So, then, did his past of using HTML and CSS come back to him
(though, thankfully, the flaming skulls and spinning things did not), and that
is where he stands as of this day.
Current Mood - Tired
Current Music - Shades Apart - Stranger By The Day
Labels:
humor,
past,
professional,
professionalDevelopment,
Programming,
Random,
software,
tech
Wednesday, February 26, 2014
TDD Saves Me From Myself
I’ve gotten into TDD at least a bit over the last few
months. It has a learning curve, but I’m finding that it sometimes saves me
from myself (and saves me time in debugging).
Part of my problem, like a lot of people, is that I
sometimes over think things or fail to see something as being multi-purpose.
A perfect example was a code kata I started yesterday. It’s
a fairly simple exercise – given a file with a list of numbers, take each
number, reverse it, add it to the original number, and check to see if the
result is a palindrome. Repeat until the number is a palindrome.
Simple stuff, really, but it’s a nice thought exercise.
It started off well. Decided to tackle the number reversal
first. Since we’re dealing with data being snagged from a file, I decide to do
the reversal with strings (since the data is already in string format, I can do
most of the things required by the problem without having to typecast between
string and int).
I name it ReverseNumberAsString since that name says what it
does fairly well (it’s not the greatest name, but it’s the first function I’ve
written in the solution, and we can refactor later).
Fast forward a bit to my writing a function to test if a
number is a palindrome. This is where I start to over think things (and where
TDD tells me I’m an idiot).
It’s been a long day and I’m a little fried, so the first
solution that comes to mind is to set pointers at the front and end of the
string and just walk them toward the center, comparing, until they reach the
middle or find something they don’t match.
(Hey, I spent years doing C and C++ code. Give me a break)
I test a negative case first, and the test fails. I’d set
the tail pointer to string.Lengtrh instead of string.Length-1 (it happens. I
was fried.) I fix it, run the test, and it passes. Yay
Try to test for a positive case and the test fails. I go to
look at the code to figure out why and, after a minute or two, I realize that I’m
stupid, because I’ve already written the answer somewhere else in the code.
ReverseNumberAsString reverses a number that’s in the form
of a string. No it doesn’t. It reverses a string. Period. I’m a dummy.
Rename the function to ReverseString like it should have
been from the beginning and use that. Compare forward to backward. If it’s a palindrome,
they should be the same thing.
Look at that. It works. Who would have guessed.
Thank you, Nunit, for making me stop and see that I’m being
dumb.
Current mood - calm
Current music - Dirty Vegas – Thursday, February 06, 2014
My Take on Professional Codes of Conduct at Conferences
Preface: Please
keep in mind that I am not trivializing sexual harassment or sexual assault. I
have loved people who have been victims of sexual harassment, sexual assault
and even rape. I myself have been a victim of sexual harassment and sexual
assault.
I am offering an open, honest
opinion from someone who has not only the right to an opinion as a person, but
a sense of perspective as the victim of and friend of victims of sexual
harassment/assault. You are allowed to agree or disagree with me as your
conscience so dictates.
Recently, there’s been a lot of
push to force software development conferences to have a code of conduct.
Honestly, any code of conduct that
could be adopted for a professional conference could be limited to “Be a
professional” or “Don’t be an ass”. We’re all supposed to be there to learn new
things, maybe to meet new people, see old friends, and generally grow as
professionals and individuals (male, female, gay, straight, bi, black, white,
red, yellow, green, polka dotted and everything in between).
A lot of what sparked off the
recent debate was an incident that happened at a regional conference a year or
so ago (and I will admit was completely unacceptable. In fact, it was
horrifying, honestly).
However, a conference code of
conduct wouldn’t have done anything to prevent the incident for a few
reasons:
1) It
only came to light *months* after it happened (if you don’t say anything,
nothing can be done by the organizers. I know some of the organizers and you’d
be hard pressed to find a better, more caring group of people. If they had
known, they would have done what they could.).
2)It
happened before the conference even started, and not at the venue (it occurred
at a bar at the resort the conference was held).
3)
Having a code of conduct doesn't stop people from doing bad things. People
stepping in and saying something while it happens or before it has a chance to
happen does.
Having a Code of Conduct that
prohibits unprofessional behavior at a professional venue is the conference
version of security theater. The only thing it does is make people “feel” safer
(and that’s, honestly, a disservice, because *life* isn’t safe and we need to
be aware of that. It lets us better prepare for when things do go wrong – and
given enough time, chances are something will happen to all of us).
It’s the equivalent of a store
having a sign up that says “Don’t steal stuff or we’ll call the cops.” It
should be common sense and, code of conduct or not, most conference staff that
I have known will deal with any problems that arise.
Is there a solution?
Yes, well more or less, but like
most real solutions, it’s not easy. The best solution is to be responsible for
ourselves, to be decent people, and to look out for each other.
That last item is the big one. So
many people now “don’t want to get involved” when they see something bad
happening. There are a lot of reasons for this – Sometimes people don’t feel
like their stepping in will solve anything. Sometimes they are afraid to say
anything. Some of them even think that someone in a position of power should be
taking care of it.
I have news for you: You need to
swallow your fear and speak up when you see it happening anyway. It’s your duty
as a member of society to help each other and that includes stepping in and
asking if there’s a problem if something looks fishy.
We’ve become far too isolationist
as individuals and have become conditioned to “let someone else deal with it”.
That needs to change.
I’m not saying to lynch someone
you see doing something questionable. Step in, make sure things are okay and
then respond appropriately.
Yes, I’m a fairly large (some may
say frightening looking) guy. Yes, I’ve been a victim of harassment and
assault. No, I don’t think a Code of Conduct will change anything. Changing
ourselves will. The demands for a Code of Conduct are just another cry to make
it someone else's problem.
I do my part. If I see something
“off”, I see if there is a problem and work to resolve it. I ask that
others do the same. It’s the only way things will get better.
Current mood: contemplative
Current music: Coldplay – Charlie
Brown
Labels:
Life,
professional,
professionalDevelopment,
software,
tech
Subscribe to:
Posts (Atom)