Flawed log blog

Battling the snake

December 27, 2019

Battlesnake logo

I got sucked into the world of Battlesnake about two years ago. A few of my friends were participating so I figured I’d try to throw together a snake as well - it sounded like it could be fun.

My first attempt was the result of tinkering for a couple of weeks and worked OK. It was written in TypeScript since that’s a language I’m familar with and enjoy using. It wasn’t designed to be incredibly complex nor performant. Although this snake wasn’t the most ambitious effort, it served as a gateway drug into the world of Battlesnake.

I never competed with this snake and just let it run on a free Heroku server for a few months before shutting it down.

The road to my current snake

A few of my friends participated in the 2018 event and after hearing about how much fun it was I decided I wanted to try creating a snake that could compete in the professional tier - after all, I work on software full-time so I should be able to call myself a professional, right?

One of my primary concerns about starting a new snake was how I could make it as efficient as possible. As every Battlesnake move has three possible directions (you can’t go backwards!) and these can be computed without any information being needed from the other scored directions, there is the opportunity to allow each direction to be evaluated in an isolated fashion. With this information in mind I figured that finding a language that prioritized making concurrency or parallelism easy was a good place to start.

Try #1 - Elixir

My first try at a new snake started with Elixir. A few things steered me in this direction:

  • A lot of new services at change.org are being written in it and I occasionally need to work with them
  • It’s built from the ground up to allow for easy and efficient parallelism
  • In my experiences functional languages lead to less bugs
  • Lots of people had been telling me how great it is
  • It’s built off of Erlang, which always sounded cool to me

However, my time with the Elixir snake didn’t go too well. After a day or so of tinkering I was getting abysmal performance from a snake that worked similar to my TypeScript one.

I don’t blame Elixir for this - I’m pretty sure I was using the language wrong. I spammed tons of Task.async()/Task.await() calls for scanning each sector of the board and I structured my board data in a way that would require a full state copy each time a sector was scored. (One of the downsides of immutable state when combined with lazy thinking.) This made for a snake that was very good at chewing up a lot of CPU and memory but never really managed to get much done in the 500ms cutoff.

Now that I have a better idea of how I can develop a snake that handles parallelism well, I’d like to return to Elixir someday. However, that day is not today, as I have become way too invested in my next attempt: the Go snake.

It’s Go time

Go was another language that I had heard tons of buzz about but hadn’t worked with much. I read a few articles that hyped up how good it was with concurrent tasks and figured that it would be a good springboard for my next attempt.

My time with Go has been pretty great - not that I’m saying that this makes it a better language than Elixir. Many people that I trust and respect sing the praises of Elixir and I have been able to see the benefits myself. However, at the end of the day programming languages are a tool like any other - you need to choose the right tool for the job. Like I said above, Elixir might work if you’re thinking about the problem in the correct way, but my way of thinking and structuring the data was more in line with what works well with Go.

A typical Battlesnake game in action

A typical BattleSnake game in action

Lessons learned with Go

While working on my latest and greatest snake there were a few notable things I learned. Some are Go specific and some are applicable to almost any development process.

Embrace WaitGroups and Channels

When I started to really prioritize making my code scale to use as many processors/cores as possible I used WaitGroups. These are a good way to ensure that multiple tasks can be accomplished at the same time. However, I was passing a shared array from function to function rather than using a Channel to send data back to my main function.

Channels provide an efficient way to redirect data from a task to a centralized area for later evaluation. Sure, you can try to just pass the data around and do your own checks to ensure everything flows smoothly - but why do that when Go provides you with the ability to do this in with high performance and a minimum of cognitive overhead?

When I implemented channels my performance improved and I found they made it easier to send more complex data back to my main function for number crunching.

The moral of this story is to use the tools that a language gives you before trying to implement your own approach. (This is something I’m always continually relearning thanks to my own stubbornness.)

Questionable practices + mutable state = oh no

One downside of Go having mutable state opposed to Elixir’s immutable state is that there is a lot of opportunity for strange bugs to creep in when you’ve got a lot of different processes accessing the same data. Thankfully this is something that can also be minimized by using the tools that Go provides.

Dealing with parallel and concurrent processing using shared data is a very difficult problem to solve and there’s little chance that you’ll manage to do it better than the people that have designed and iterated on the language that you’re using.

What I learned here is similar to what I learned in the above point - use the tools that the language gives you!

Tests, Tests, Tests

This applies to any programming language and almost any project that you can undertake. After a while I found that I was having trouble telling whether changes helped or hurt my snake. It’s incredibly hard to tell if you’re improving anything if you don’t have any concrete metrics to test against. Even though I’d never write something at work that completely lacked tests I ended up with a side project that had none.

Thankfully Go provides go test right out of the box and it’s a great test framework. By structuring my snake’s output to be verbose and running a command like go test | grep -B 100 -A 1 "FAIL:" I was able to see where and why changes could cause previous strengths to regress. (Yes, that is 100 lines of output before the FAIL: - my output is verbose!)

I’d run my snake in the arena and watch for games where it made ‘bad’ moves. I’d then ssh into my server and cat/grep the log files for that move. I’d copy/paste the output into my test suite, add it as a new case, and then tweak my code until my snake behaved in the way I wanted it to without breaking any of the previously added tests. This approach definitely helped me climb up the leaderboard!

At the time of this writing I have 41 different test cases and will continue to add more. I’ve even had to remove a few tests as I’ve improved my snake - turns out I don’t always know what the best move is. 🙈

Benchmarks are good

Speaking of being able to measure whether changes are good or not, if you’re designing something that strives for great performance you need to be able to measure how long transactions take.

Using Go’s built in testing library you can run a command similar to go test -cpuprofile cpu.prof -memprofile mem.prof -bench . to generate parsable benchmark files that are readable by pprof.

Go’s pprof is a built in package that provides detailed benchmarks. Using go tool pprof --pdf after creating my benchmark data I was able to see detailed flowcharts that showed me exactly how much time was being spent in each function and also gave me a visual representation of how often particular functions were being called by others. This helped me remove some redundant calls and improve performance.

Also, if you feel like you’ve made a change that may have helped or hurt performance this will give you the ability to measure that! Granted I wasn’t able to easily bench on the 32 vCPU machine that I used for the event in 2018 (since it would have been expensive to run it more than the day of the event), but this will give you a good sense of how well your code is performing.

The 2019 Event

At the 2019 event with my frequent collaborators Cory and Griffin

Where I need to go from here

Despite all of the work and thought that I’ve put into my snake it still doesn’t perform as well as I’d like. When I have it active in the arena it usually gravitates towards the top three rankings, but I find that in competitions it doesn’t do nearly as well as I’d hoped. I think this is because I’ve designed it to be too cautious - none of my logic really rewards it for taking other snakes out. Its primary purpose is to keep itself alive for the next 6-7 moves rather than removing other snakes from the board.

I intend to iterate on my snake and compete in Battlesnake 2020, but before that I want to do the following and make my snake better:

  • Increase aggression: reward my snake for removing other snakes from the board
  • Implement a MiniMax style scoring system again: My previous iterations of the snake used MiniMax to great effect, but I’ve gradually moved away from that scoring system in favour of flood filling the board with other snakes moves and just avoiding those areas if it’s risky. Using MiniMax will allow for my snake to be ‘smarter’ around risky snakes and will also assist in making my snake more aggressive.
  • Clean up my pathfinding algorithm: currently there can be moves a few moves ahead that score highly but they don’t consider the fact that an earlier move could have removed what caused that move to score so highly

After reading that above list it seems like I still have a lot of work to do - guess I better get back to it! 😅