Friday, April 28, 2006

Regarding Pong

As you may have read, I was awarded first place in the Smalltalk Solutions 2006's Smalltalk Coding Contest. I would like to thank the organizers for giving contestants an opportunity to truly enjoy ourselves. I would also like to thank the kind words of others, including those of Blaine Buxton, Michael Lucas-Smith, James Robertson and Eliot Miranda.

It's interesting that the two finalists this year were mentored by John Sarkela. As far as I am concerned, the evolution of my programming skills show a rather large discontinuity due to him. The amount of stuff I both learned and unlearned because of his careful instruction is enormous. Thanks so much for melting my brain over and over again for the last 6 years John, it is truly a matter of night and day.

Ok... so how was it? The first phase of the contest was about implementing an autonomous player for 4 different versions of Pong. The games ran in a server, which would provide the status of the game in ascii, and accept commands via an URL with arguments.

I thought about the structure of the program in terms of how I play video games. I felt that I would take a snapshot of what I perceived in terms of game objects, run some strategy in the background, and then let my hands execute as told by the strategy in the background as well. This allows for hands to continue going after whatever objectives while the strategy runs on a new snapshot of the game. It also permits surprising circumstances to stop the strategy and the hands and quickly change behavior.

When relaxed, this allows deeper strategies to run and build up a long queue of commands, based on high probability predictions, for the hands to execute. When under stress, the queue shortens and the strategies become more and more like a knee-jerk reaction. Experience pays off here, as it can cache the results of expensive evaluations for quick access via lookup table.

All this stuff suggests a collaboration of multiple Smalltalk processes. I chose to implement it in a diagram in the shape of a diamond. At the bottom is the interface, which encapsulates the perception process (which retrieves new raw status from the server by means of pulling) and the action process (which sends raw commands to the server by means of pushing). These two processes are independent of each other.

Up and to the left of the interface are the eyes, which take the latest raw status from the perception process and reify objects like balls, bats, the field, the scorecard, etc. The eyes only do this if they are asked for the latest status, and they detect that the perception process has perceived new information.

Above the interface, and higher than the eyes, is the strategy. It pulls the objects reified by the eyes and produces objectives based on the kind of game and whatever relationships it can perceive between the objects in play. The strategy then pushes objectives for the hands to meet.

Above and to the right of the interface are the hands, which take the objectives and by means of double dispatching lets them take control as they decide how to traverse an information field implied by the objects perceived by the eyes. As a result, raw commands are queued by the hands at the interface by means of pushing. The action process pulls these raw commands out of the queue and pushes them to the game server.

Since the game would run at 30hz, I added frequencies and delays to all these processes so they would yield and be nice to each other.

All this is great except that the server was designed to not answer a status request until the next frame would start. This means that VW would lock until there was a new status, and if the perception process woke up at the wrong time it would keep locking VW all the time. Since it caused me quite some grief, I went to sleep at 2am on Sunday (very early) morning thinking I would have no time to recover. I only had 22 hours to go. No games worked at that point.

At 10am on Sunday I got up and decided that I had to come up with anything that could possibly work. I was already running rather high risks because I had been feeling time would be short, so I had decided to not implement any SUnit tests. But I was very determined to provide any sort of submission, even one of bad quality if need be. One of the things that made it possible is that roughly from the time I learned Smalltalk, I forced myself to refactor code as I write it. After many years of practice, it has become somewhat natural. The other thing that made it possible was that the design was spot-on. And so, the last hours of the first phase began for me.

Since the process scheduler and the processes themselves were nicely written, I changed the inner workings from 5 concurrent Smalltalk threads into 1 by reimplementing one method. Immediately, things started working much better. This raised my confidence and I focused on making games play.

I decided to have one strategy class per game, and one objective class for each thing the strategy wanted to do. Overall, the strategies just try to Keep It Very Simple and just answer as quickly as possible, with emphasis on answering the ball.

Early on I had decided that since the bats were quite slow compared to the balls, it would probably not pay off to move the bats horizontally. From a coverage point of view, it seemed that leaving the bats closest to the "goal line" would make the most sense. It also made the game model simpler.

In Tennis Pong, I adopted the strategy of moving the bat to answer the ball. Then, to make it less likely to be caught off guard, the bat would be moved back to the center while the ball traveled towards the opponent. Several things came up.

First, I decided to predict the trajectory of the incoming balls to find out where the bat should be at what time to accomplish an interception. To do this, I subclassed the ball class and added a predictedETA instance name. I wrote predictedETA as ^0 in the superclass and I was in business because now I could see how far the bat would have to move, divide that by the bat's speed, and I would have the time needed for the bat to get to intercept the ball. I wrote a (rather inelegant) prediction routine that would plot the position of the ball for each integer horizontal position by creating predicted balls with predictedETA > 0. This means the strategies would have access to subhertz precision predictions, and eventually this allowed bats to catch balls that would otherwise appear to be unreachable as the speed of the ball increased.

For each frame, this means that prediction would create several hundred balls that would become garbage rather quickly. At 30hz, this means up to several tens of thousands of predicted balls per second. Would VisualWorks be able to deal with this on my rather ancient Pentium III/600? I ran some profiling and just to be sure I improved the execution speed. When I saw that VW could easily predict the full trajectory of more than 2600 balls per second, I thought I was covered as the contest machine would surely be a much faster one.

All things considered, VW was very fast and thus allowed the players to run realtime without any further optimization.

Bounces off the edges of the field did not always match the predicted trajectory, but as the eyes would perceive new balls, eventually the strategy would pick up on it and produce updated objectives anyway, so I didn't worry about it.

I also neglected to consider collisions with opponent bats as they could move, as well as aiming the answers away from the opponent bats. This simplified things considerably allowing me to focus on answering the ball (i.e., not losing).

And what were the objectives? For Tennis Pong, there were two objectives. One was an interception objective, and as such it would know the predicted position of the ball it needed to intercept. This allowed it to calculate an ideal position for the bat to be placed at, and thus it could decide which command was necessary to issue through the hands.

The other objective was a preparation for incoming balls objective. It didn't need to know a prediction because the ideal position was fixed at the vertical center of the field. In this way, the Tennis Pong strategy would essentially think along the lines of "if there is an incoming ball, predict its trajectory and if the bat can reach it, then issue an interception objective --- otherwise, issue a preparation for incoming balls objective". The last part is important because it says that if the strategy sees it cannot answer, it moves to a location that makes it most likely to answer the server's first serve.

Because of continuous and merciless refactoring, the actual implementation was quite smaller than the italicized blob.

Since VW was quite fast even on this slow machine, I could bring up the server and fork off two players. Thus, I could see them play against themselves. I debugged the implementation by looking at the server display. Not the most time efficient manner, but it had to do.

Once Tennis was running I decided to go for Football Pong. In this variant, the game had two bats that were linked vertically. If you moved the first bat up and down, the second bat would do the same. Hence it didn't seem much different than Tennis at first...

... but I was wrong! There were two things I had to consider further. First, the offensive bat further down the field could play against yourself, so the player should be able to move it out of the way when needed. Second, the player should be able to use the offensive bat to intercept balls as well. What to do?...

For incoming balls, the strategy could easily use the predicted trajectory and issue an second bat interception objective if possible. If the ball were unreachable by the second bat, it could try to issue a first bat interception objective if possible. If no such thing, then it would just issue a first bat preparation for incoming balls objective (since bats are locked, moving the first bat also moves the second one). For outgoing balls, it could check the predicted trajectory and issue a second bat avoidance objective in which the prediction would be used to determine a zone for the bat to be out of. Thus was the strategy written.

The objectives themselves became rather easy to write (except the avoidance one, which is still a mess). Just determine an ideal position and issue the right command via the hands. I felt I had found an expressive way for strategies to specify what needed to be done. The class count was going higher and higher, but there were very few occurrences of ifTrue:ifFalse:, which made the code quite clear.

Time for Hockey Pong. In this case, bats could move independently of each other, so I just refined the strategy for Football Pong to take advantage of this. Precise control of the bats was accomplished by specialized objectives that knew how to move the second bat independently of the first bat.

I refactored objectives so they would understand moveFirstBatByDeltaY: and moveSecondBatByDeltaY: messages. To handle the Football Pong case where moving the second bat was done by moving the first one, I added a layer of indirection: the move* messages would send privateMove* messages. So, for example, the Football Pong second bat objectives would implement moveSecondBatByDeltaY: by sending privateMoveFirstBatByDeltaY:.

With just 5 hours to go I still had to implement Multiball Pong, or Tennis Pong with 6 balls. By that time I wasn't up to implementing a bucketing strategy, so after a few experiments I decided to refine the Tennis Pong strategy to sort objectives for reachable incoming balls by predictedETA to interception, and go for the one happening sooner.

I submitted my first phase code with 2 hours left. Eventually I learned I was a finalist and that the contest had been extended. I decided to do heavier refactoring so that the program would stand a better chance to survive change, as the second stage involved an unknown variant of Pong. The only thing I didn't refactor was the ball predictor, which I should have moved to the object describing the field. I submitted the new version and hoped for the best.

The day of the contest came. I had mixed feelings about it. On one hand, I felt quite confident in my abilities. On the other hand, I did not know how my first phase program had fared against Blaine's. I suspected that mine was better because after the first phase Blaine and I had exchanged some notes, but I was aware that I could not predict how they would interact together. My worst nightmare was that Michael Lucas-Smith would have us play Squash Pong, a variant in which both players play towards the same side. Even then I could think of ways in which I could get the thing to play this, but only after major changes that would have to go in without SUnit...

But to make it even more interesting, Michael just gave us a new server that played a new game that was not described to us. In other words, we had to guess how to play by looking at it! I ran my player against the server and immediately it complained it didn't know how to play a game called 'super'. Ok, so I had to make a strategy subclass and implement supportedGame on the class side as ^'super'. I played the game a bit and noticed that both bats were locked as in Football Pong but moved in opposite directions. This was solved by creating a second bat objective with a specially crafted moveSecondBatByDeltaY: which not only moved the first bat but also reversed the vertical direction. In addition, Super Pong was multiball. But I had a multiball strategy already. After pushing up a method, I wrote the strategy in terms of producing all possible interception objectives for both bats, and then choosing the one that would be accomplished first. Just in case, if there were no incoming balls I had the strategy issue a preparation objective.

Since I assumed that there would be plenty of incoming balls, I did not care for the second bat avoiding outgoing balls. I put all the emphasis on answering balls as quickly as possible.

Balls seemed to collide with each other. I ignored this as I knew I would not be able to safely implement an improved trajectory predictor in time. I trusted that VW would be so fast on the contest machine that it could handle trajectory "surprises" immediately.

Also, I couldn't tell exactly why but balls would sometimes move in parabolic trajectories instead of linear ones. It seemed that this happened because of bounces, but not always. Their movement would go back to linear after 3 bounces or so. Again, I knew I would not be able to account for this, so I ignored it and again trusted the strategy to choose quick interceptions instead. I thought that a ball with a parabolic trajectory would be either intercepted quickly because linear prediction would be good enough for the bat to catch up, or be missed quickly and then the strategy would pick another ball to answer. There would be no time consuming interceptions because the strategy was going for the quickest one. I pretty much thought it was good enough and let it go.

I missed the fact that the speed for the bats was changed from 2 to 4 for the first bat and 5 for the second bat. Because I did not perceive a difference in this value, the strategies ended up being less effective than possible. When I saw them play against each other, I did not notice issues either, so I thought everything was fine. Fortunately this did not turn into a major blunder.

We had 2 hours to implement this, but I had two Super Pong players competing against each other in about 30 minutes. Overall, I had created a strategy class with a few methods, an objective class with one method, and I had also pushed up a method in the strategy hierarchy. I spent quite a bit of time making sure they would not crash. And the final phase of the contest came to an end.

Since I had seen Blaine spend much more time with his players, I suspected he was having a hard time. I also learned a few statistics. My original weekend project first phase player had beaten Blaine's in every game. It had also beaten the players of Michael Lucas-Smith's two beta testers, who had had the luxury of 1 month of time to work on their stuff (with the exception of multiball, which lost to a bucketing strategy).

After the contest was extended, Blaine had actually improved his players. While we were implementing Super Pong, Michael pitted my refactored player against Blaine's improved players. He started with Tennis Pong, and I immediately saw Blaine's player was better than mine. It lost the first game, and I thought I was in trouble. My player came back and tied Tennis, but I knew it had probably done that thanks to server bias, luck, and minor details. My player won all the other games, but I did not have a chance to see them do so.

Since multiball appeared to be a strength of my player compared to Blaine's, I left the contest thinking I had a good chance of winning. My biggest concern was that even if what I had written was better overall, the interaction of the two could very easily result in Blaine's player winning by emphasizing one of the many shortcomings in my player. I knew I had assumed a lot of stuff, and that all the assumptions could easily amount to taking a leap of faith into the void. I had also seen Blaine's more than cool trajectory prediction display. I had no choice other than to be patient and wait.

Then Tuesday came, and our Super Pong players were brought up on the projector. We would play 6 games, 3 on each side. The first game started, and I saw how my player would slowly but steadily pick up gains until it won. Good stuff, but I knew it might had been too easy: during testing I had noticed the server was biased towards the side I had just played on. The win had seemed to require quite a bit of work. It could have just been a chance success.

And indeed, after switching sides, the second game started with Blaine's player picking up a 10 point advantage. Oh man, it was going to be a mess. I thought it was the server bias up to some extent, yet I was concerned. Was it all server bias, or was there a strategy issue? I could not help thinking about winning ratio probabilities, but then my player stopped losing ground. After oscillating for a bit, it started coming back. Slowly, it erased the deficit and started building up a lead. It took what seemed a long time, but it won.

At this point I thought my players were somewhat immune to server bias, and they seemed to be barely able to deal with Blaine's players. Apparently, I would win after suffering quite a bit. But was a 2-0 lead a fluke, a consequence of lucky random numbers in the server? It appeared unlikely, but I had to wait. While I was thinking about these issues, my player won game 3. I could not lose the contest at this point.

Or could I? I refused to let go because one never knows. What if my player crashed? The rules said that in that case I would have one minute to fix it, or I would lose. I had tested it, and it seemed fine, but what if I had missed something? I could have easily done that without SUnit and with all the assumptions I had made. What if?...

The following games were a blur for me, a mixture of design discussion, white hot mental debugging just in case, watching the games, and way too much other stuff. All of a sudden the contest ended and I had won 6-0. I remembered the weekend of the first phase, when I was so concerned I would not make it, when I had to literally rip code out of my mind and into Smalltalk because I was tired of working on it and I had badly needed a break I could not take. I felt I had been determined to win and that through perseverance I had done it. A great accomplishment, good stuff!

And there was Blaine next to me. We competed and threw our best at each other, but we are also close friends. Of course I wanted to win, yes, but I didn't want him to lose! Even now I have mixed feelings about it. On the bright side, as far as problems go, I guess it's an interesting one to have.

This post was heavily edited, make sure you have the latest version.

No comments: