Thursday, May 01, 2014

Sometimes Optimization Matters (Recursion vs Iterative OR "A 4 Billion Character String is REALLY Long")



Computers today are really fast. In fact, they’re fast enough that some people don’t bother to optimize a lot of code. Sometimes that’s okay – after all, premature optimization isn’t a positive thing and can see you spending a great deal of time in exchange for little or no gain. Other times, however, you know there’s going to be a difference of an order of magnitude or more on a computation intensive part of a system.

As an example, there is a coding challenge on CodeEval to find the character in a string at a specific location. You aren’t given the string ahead of time, just the first character and the pattern to use in order to build the string (up to a string length of 3,000,000,000).

I’ll summarize here since it’s pretty simple. The character set is 0, 1, and 2. The first character is 0. You build out the string by using the previous string as the first half of the new string and add one to each digit. This means that 0 => 01 => 0112 => 01121220 and so on.

You only want one character in the string (we’ll call it N). The rest, as far as you are concerned, is garbage.

The knee jerk reaction for a lot of people would be to build the string to the length that you need and just access the character directly. That might even be okay if your upper bound for length wasn’t 3 BILLION. Given that, however, you need a better way (I’ll show you why here in a little while).

If you take a minute to look at it, you’ll realize that, by following the pattern they’ve defined for the number, that you will always end up with a string that has a length of a power of 2 (1, 2, 4, 8, 16, and so on). That means that you can determine how long the shortest string of that pattern will be by taking x = Ceiling of log2(N) and then calculating 2^x.

That doesn’t help us much by itself, but with that information, the pattern, and the fact that we only need one character, it makes this problem a prime candidate for recursion.

Method 1 – Recursion

Recursion can be kind of a scary word for some people because initially it can be sort of difficult to wrap your brain around. However, given a little practice, the hardest parts of the exercise are structuring the problem in a way that you can use recursion and making sure that you actually write it properly (in fact, it took me a little while to get rid of a bug because I had a line of code in the wrong place).

I’ll link to the code here so I don’t have to walk through everything, but to put it succinctly, after calculating the needed string length, the recursive approach essentially keeps track of where the target character would be during each string transformation, uses that to calculate how many times the character would have to change (C) and then calculates our N = C % 3 (each character is essentially a one digit base 3 number, so we can get away with that).

It’s probably as close to doing the problem with pure math as you can get without actually having an equation. As a result, it’s quick. REALLY quick. Fast enough, in fact, that I had to measure runtime in ticks (according to MSDN documentation, it takes 10,000 ticks to make up a millisecond).

The chart below shows the runtime (in both ticks and milliseconds) of one calculation for each given string length (2^0 through 2^32) – not counting file I/O or output to screen (which the original project required). This is just the time from being handed the number to getting the answer to what N is.

Power of 2
Recursive Runtime (in Ticks)
Recursive Runtime (in Milliseconds)
0
4
0
1
10
0
2
4
0
3
4
0
4
5
0
5
5
0
6
5
0
7
4
0
8
5
0
9
5
0
10
4
0
11
5
0
12
4
0
13
5
0
14
5
0
15
5
0
16
5
0
17
5
0
18
5
0
19
5
0
20
5
0
21
5
0
22
6
0
23
6
0
24
6
0
25
6
0
26
6
0
27
32
0
28
6
0
29
6
0
30
6
0
31
7
0
32
7
0

Note that the 32 tick runtime for the 2^27 length string is an aberration (probably caused by time slicing in the operating system). Running it multiple times, every test case ran at 20 ticks or below (generally 10 ticks or less). The only reason it looks like a big difference is because ticks are really REALLY tiny.

So, to sum up the pros and cons for the recursive approach to this problem

Pros: Really fast. Takes up less memory because there’s no actual string being built.
Cons: More difficult to write and debug


Method 2 – Iterative (Don’t do it this way. Please)

Again, I’ll give you the code for this method.

This time, we’re actually building the string in order to find the character. There are a few problems with this.

First is that it’s SLOW. I initially tried doing it two different ways. The first way ran for over HALF AN HOUR trying to calculate the 2^32 length string before I stopped it. The way which is in the code linked above runs much faster because it uses String.Replace() but seems to use more memory.

Second, it takes up A LOT of memory because it’s actually building the string and keeping it in memory. That’s not a big deal when the string is small, but according to the formula here, the size of just the string at its maximum length is 512MB.

Below is the benchmark data for this approach using the same constraints (no file or screen I/O) and the same numbers as the recursive approach.

Power of 2
Iterative Runtime (in Ticks)
Iterative Runtime (in Milliseconds)
0
4
0
1
5
0
2
4
0
3
6
0
4
7
0
5
9
0
6
21
0
7
17
0
8
35
0
9
56
0
10
111
0
11
281
0
12
386
0
13
2011
0
14
6360
3
15
12657
6
16
12528
6
17
125246
61
18
57912
28
19
53481
26
20
101440
49
21
204253
99
22
400705
195
23
1250113
611
24
1778822
869
25
3158791
1544
26
6980949
3412
27
OME
OME
28
OME
OME
29
OME
OME
30
OME
OME
31
OME
OME
32
NA
NA

OME stands for Out Of Memory Exception. These test cases literally caused the program to exceed its allocated memory (it was compiled for “any processor” since NUnit’s test runner didn’t want to play nicely with forced x64 solutions).

The NA result at 2^32 reveals a special problem. In C#, a character at a specific index of a string is accessed by using stringName[int]. The catch is that 2^32 is larger than an Int (you need at least a Long if you want a number that big), so you’d need to compensate by doing a substring to chop off the first Int32.MaxValue and reducing the index by the same amount.

I did that in the first attempt at this problem, but since I found out it was going to throw memory exceptions, I decided that I’d refrain from re-implementing it for the sake of simplicity and more readable code.

As you can see, this approach is MUCH slower than the recursive one. In fact, the general trend is to roughly double the runtime for each larger string. This means that a string of length 2^32 would take roughly 384,000,000 ticks (38,400,000 times the length of time for the recursive solution assuming the recursive solution took 10 ticks).

This is, of course, assuming you made the string from scratch for every test case. You could get around that to an extent by storing the string and just building it out longer as needed. However, you’d still be eating up a lot of memory to store the string. At its longest, the string would be over 4 billion characters long.

That’s longer than War And Peace. This is why, sometimes, optimization really matters.

I’m not sure I really need to sum up the pros and cons for this approach, but

Pros: Stupidly easy to write
Cons: Slow and memory intensive.

That’s not to say that recursion is the right answer to every problem, because it isn’t. There are times, however, when the extra work in designing a more efficient algorithm in order to tackle a problem can really pay off.

Just some food for thought.

Current mood: thoughtful
Current music: Shawn Mullins – Lullaby

Wednesday, April 23, 2014

The Evils of Perl (or Logic Should Make Sense)


The people who know me professionally (even a little) have probably picked up on the fact that I hate Perl. I spent several years as a sysadmin, so I come by my Perl aversion honestly.

Don’t get me wrong, it’s a useful language and I found it to be a very handy tool, but it’s so very easy to make code that even the person who wrote it isn’t sure what it does. The next day.

Or it could be that I’m a crappy developer. Either’s possible. Maybe a little from Column A and a little from Column B.

It’s the only language that I’ve seen where your cat can walk across the keyboard and come up with a valid program. What does it do? We may never know. It may parse a file for a list of addresses, trigger an ICBM launch, or create animated gifs of dancing cats, but by gum it’ll run.

Personally, I’m hoping for the dancing cat gifs. Everyone loves dancing kitties.

My standard quip for why I finally stopped coding Perl is that in any sane language, to get the length of an array, you do something along the lines of myArray.length. That may not be exactly how it’s done, but it will be something similar.

In Perl, you do the following: $myArray = @myArray

You assign the array variable to a scalar variable (single piece of information. Int, char, etc.). This to me makes zero sense from a logical standpoint.

I made that comment about a week or so ago, and the person I was talking to said that it makes perfect sense because it’s an implicit type conversion. (I’m not sure if he was joking or not. I hope, and think, that he was).

He had a sort of point about implicit type conversion, but that sort of thing is supposed to follow a logical convention of some kind. For example, an implicit conversion from int to long makes sense (it’s basically just shoving existing data into a bigger container).

You could even make the case for implicit conversion between string and character array data types. I wouldn’t do it, because it’s probably better done explicitly so you don’t shoot yourself in the foot, but the argument could be made.

However, using implicit conversion to cast an array to (effectively) an integer in order to get the length of the array doesn’t make any logical sense. It’s like buying a chicken because you want a renewable source of pineapples.

In what non-Euclidian universe does this make sense?

“Well, I just need to figure out how long this array is, so I’ll assign it to a scalar and HOLY CRAP! THERE’S CTHULHU!”

Ia! Ia! ArrayLength! Fhtagn!

Any language that will summon Cthulhu by simply using it is probably a bad choice…

Now, if you don’t mind, I’ve got to Great Old One proof my office.

Current mood: Amused

Current music: Charlie Daniels Band – The Devil Went Down to Georgia

Tuesday, April 08, 2014

In The Beginning



In the beginning, there was a television. To the television was attached a Tandy TRS-80 CoCo. Then he said, let there be code. And there was code, and it was in BASIC. He was a child and found the fact that he could make a computer do things to be really cool.

As a teenager, the internet was laid open to him. HTML was discovered and there were many horrible mistakes made, including the dreaded animated gifs of flaming skulls and spinning things. The land was known as GeoCities, and it was eventually cast into perdition for its sins.

Then was Java made known unto him, and, though he tried, he found it confusing. This was the first age of Java, so he was not alone in this.

Lo then did he graduate high school and make his way to college in an effort to learn to do this shit properly.

There did he learn of many things – Pascal and Prolog, C and C++, Scheme and Assembly, Socket programming, the creation of data packets, and stack traces among many other skills and pieces of arcane lore.

He paid the campus ferryman with his toils as a sysadmin where he applied logic to a place of chaos known as a non-profit. All the while, using his position as an excuse to do things programmatically so as not to have to run around the hilltop known as The Ridges as often.

Then did he return to Java, and he found it…okay. After all, with a background in object oriented programming, data structures, and the like, Java now made much more sense.

Upon his path, however, he decided that Java was not the right tool for him, so he endeavored to learn C#. Additionally, in order to wipe away, or at least atone for, his past sins of using Perl, he also learned Ruby to an extent that would allow himself to get into (and hopefully out of) trouble on occasion.

He walked the paths of desktop development and eventually began to work in order to master the arts of web development as well using the MVC framework. So, then, did his past of using HTML and CSS come back to him (though, thankfully, the flaming skulls and spinning things did not), and that is where he stands as of this day.

Current Mood - Tired
Current Music - Shades Apart - Stranger By The Day

Wednesday, February 26, 2014

TDD Saves Me From Myself



I’ve gotten into TDD at least a bit over the last few months. It has a learning curve, but I’m finding that it sometimes saves me from myself (and saves me time in debugging).

Part of my problem, like a lot of people, is that I sometimes over think things or fail to see something as being multi-purpose.

A perfect example was a code kata I started yesterday. It’s a fairly simple exercise – given a file with a list of numbers, take each number, reverse it, add it to the original number, and check to see if the result is a palindrome. Repeat until the number is a palindrome.

Simple stuff, really, but it’s a nice thought exercise.

It started off well. Decided to tackle the number reversal first. Since we’re dealing with data being snagged from a file, I decide to do the reversal with strings (since the data is already in string format, I can do most of the things required by the problem without having to typecast between string and int).

I name it ReverseNumberAsString since that name says what it does fairly well (it’s not the greatest name, but it’s the first function I’ve written in the solution, and we can refactor later).

Fast forward a bit to my writing a function to test if a number is a palindrome. This is where I start to over think things (and where TDD tells me I’m an idiot).

It’s been a long day and I’m a little fried, so the first solution that comes to mind is to set pointers at the front and end of the string and just walk them toward the center, comparing, until they reach the middle or find something they don’t match.

(Hey, I spent years doing C and C++ code. Give me a break)

I test a negative case first, and the test fails. I’d set the tail pointer to string.Lengtrh instead of string.Length-1 (it happens. I was fried.) I fix it, run the test, and it passes. Yay

Try to test for a positive case and the test fails. I go to look at the code to figure out why and, after a minute or two, I realize that I’m stupid, because I’ve already written the answer somewhere else in the code.

ReverseNumberAsString reverses a number that’s in the form of a string. No it doesn’t. It reverses a string. Period. I’m a dummy.

Rename the function to ReverseString like it should have been from the beginning and use that. Compare forward to backward. If it’s a palindrome, they should be the same thing.

Look at that. It works. Who would have guessed.

Thank you, Nunit, for making me stop and see that I’m being dumb.

Current mood - calm
Current music - Dirty Vegas – A Million Ways

Thursday, February 06, 2014

My Take on Professional Codes of Conduct at Conferences


 

Preface: Please keep in mind that I am not trivializing sexual harassment or sexual assault. I have loved people who have been victims of sexual harassment, sexual assault and even rape. I myself have been a victim of sexual harassment and sexual assault.

I am offering an open, honest opinion from someone who has not only the right to an opinion as a person, but a sense of perspective as the victim of and friend of victims of sexual harassment/assault. You are allowed to agree or disagree with me as your conscience so dictates.

 

Recently, there’s been a lot of push to force software development conferences to have a code of conduct.

Honestly, any code of conduct that could be adopted for a professional conference could be limited to “Be a professional” or “Don’t be an ass”. We’re all supposed to be there to learn new things, maybe to meet new people, see old friends, and generally grow as professionals and individuals (male, female, gay, straight, bi, black, white, red, yellow, green, polka dotted and everything in between).

A lot of what sparked off the recent debate was an incident that happened at a regional conference a year or so ago (and I will admit was completely unacceptable. In fact, it was horrifying, honestly).

However, a conference code of conduct wouldn’t have done anything to prevent the incident for a few reasons:

 

1) It only came to light *months* after it happened (if you don’t say anything, nothing can be done by the organizers. I know some of the organizers and you’d be hard pressed to find a better, more caring group of people. If they had known, they would have done what they could.).

2)It happened before the conference even started, and not at the venue (it occurred at a bar at the resort the conference was held).

3) Having a code of conduct doesn't stop people from doing bad things. People stepping in and saying something while it happens or before it has a chance to happen does.

 

Having a Code of Conduct that prohibits unprofessional behavior at a professional venue is the conference version of security theater. The only thing it does is make people “feel” safer (and that’s, honestly, a disservice, because *life* isn’t safe and we need to be aware of that. It lets us better prepare for when things do go wrong – and given enough time, chances are something will happen to all of us).

It’s the equivalent of a store having a sign up that says “Don’t steal stuff or we’ll call the cops.” It should be common sense and, code of conduct or not, most conference staff that I have known will deal with any problems that arise.

 

Is there a solution?

 

Yes, well more or less, but like most real solutions, it’s not easy. The best solution is to be responsible for ourselves, to be decent people, and to look out for each other.

That last item is the big one. So many people now “don’t want to get involved” when they see something bad happening. There are a lot of reasons for this – Sometimes people don’t feel like their stepping in will solve anything. Sometimes they are afraid to say anything. Some of them even think that someone in a position of power should be taking care of it.

I have news for you: You need to swallow your fear and speak up when you see it happening anyway. It’s your duty as a member of society to help each other and that includes stepping in and asking if there’s a problem if something looks fishy.

We’ve become far too isolationist as individuals and have become conditioned to “let someone else deal with it”. That needs to change.

I’m not saying to lynch someone you see doing something questionable. Step in, make sure things are okay and then respond appropriately.

Yes, I’m a fairly large (some may say frightening looking) guy. Yes, I’ve been a victim of harassment and assault. No, I don’t think a Code of Conduct will change anything. Changing ourselves will. The demands for a Code of Conduct are just another cry to make it someone else's problem.

I do my part. If I see something “off”, I see if there is a problem and work to resolve it. I ask that others do the same. It’s the only way things will get better.

 

Current mood: contemplative

Current music: Coldplay – Charlie Brown