UX in Code, and Mental RAM

For a long time now, we’ve had a very insidious problem with our animation system. Every time we tried to set up animation for a tile or object in the game,  it would flail about wildly in different directions some times, and seem to pause randomly other times. Sometimes we even got both at the same time! Keep in mind, this is basic frame-wise 2D animation, nothing fancy or complicated— except for me, apparently. I had let it flounder for almost a year while working on the AI, and not only was it pretty broken, to begin with, the bitrot was getting to it; in fact, I still have pieces of the animation code from before we went isometric lying around! I revisited it this week, however, since there was a new animated title screen Will wanted to be integrated ASAP and at the time Govind was working on replacing our pathfinding code with his own (the AI had been out of commission for the past month because it was having strange pathfinding behavior that made navigating in enclosed spaces impossible). I did a lot of digging and I found three things:

  1. That the animation frames were tied to the draw frames
  2. That the animation frames were displaying “in order”
  3. And that, apparently, dividing the current (cumulative) frame count by the delay is not a reliable way of delaying animation frames.

The first item on that list, at first, doesn’t seem like a problem— its a natural way of understanding how animation works. However, this leads to unexpected happenings. You see (and this foreshadows a later section!) when you read through the update code for the game, which is honestly about 60% of it, you’re not really thinking about the cumulative time it takes to do all these thousands of LoC of things. You might, if you’re like me, be thinking about how long this piece of code might take to execute, but only really if you’re changing it or writing it for the first time. At best, you’re reading it start-to-finish, as a new person to the codebase, and sort of have it in your head that, yeah, this would take a bit of time to run, overall. If you’re already working on something specific, you’re not gonna think like that.

I think the reason for this is that as programmers we build up an expectation of what’s important enough to actually be something we remember, and what’s more, an expectation of how much we can actually remember. These expectations are usually right— the problem comes when we forget that we even needed to know something. When we forget that there is “something” there, that’s normally a piece of information other things depend on, not the details of it.

In the specific case from above, I not only forgot how much time it would take (even in general terms), but that I needed to worry about that. Since I was using frames to time everything (even AI movement, which is, I know, bad) I wasn’t using the delta-time part of my update code, so I thought frames should come consistantly, all I needed to do was latch on to them to time animations. As it turns out, that’s not viable… frame times aren’t that inconsistent, but when they are, we want the animation to still be on the right timetable.

The second item on the list was a little more insidious. For Astra Terra’s asset loading mechanism, since we don’t have a texture packer or anything right now, because we have minimal assets and aren’t likely to have an overall huge amount. At the moment, we just do a directory tree walker, and it walks through and assigns a vector of textures to each texture name. Single images are a one-texture vector, and directories are a larger vector of textures, using the images in that directory, and the name of the directory as the texture name. Simple enough. The problem was that I was just pushing new textures onto the end of the vector when they were encountered. When I wrote the code, I assumed that the walker would sort the files as read in. They didn’t provide a method for sorting or specifying sorting order, so there has to be some kind of default behavior! I then promptly forgot that this was even a concern. After all, there’s only so much I can remember.

Imagine my surprise when it turned out it was not properly ordered! Sure enough, frames were being loaded completely out of order, and so even though the texture drawing code was playing them in the right order (according to it) it was showing the wrong textures! Because the library didn’t take the time to do something you should expect, because it was off the beaten path, even a little, it took me weeks to fix this practically undebuggable program.

The general point about both of these cases is that when writing code, you should try to either make it obvious, in the use of it on the other side of a system boundary, that there is something there to think about, and constantly remind the user of the code about it (in my update example, I should have had all the update sub-functions and draw functions ask for the time and delta time) or, make it something that doesn’t need to be worried about, because it does a superset of what is usually needed (the walker library should have sorted the files).

The third and final item on that list was another of these things that I didn’t think was something I even needed to think about. Surely, dividing the system time by the delay would work? It’s just the number of delays since all time! Modulo that by the length of the vector of… Intuitively, it should work but it again forgets about an important piece of information: we don’t want to measure animation delays since the beginning of time, and moreover, that division might not line up on the right boundary between delays, making the delay inconsistent or more out of sync over time. Even worse, if we do this for every frame in the draw-field, they’ll all be slightly out of sync with each other, and the farther we go, the worse it gets.

If you’re noticing a pattern here, it’s about unknown unknowns. Two of these are my fault, and one is a libraries fault. The two that are mine point to the inevitable complexity of code, and the unknowns that morph into unknown uknonwns. But the worst one was a libraries fault. Unknown unknowns are a huge, recurring problem that we will always have in programming. It just is. We work with big systems (and before you scoff at me, I’m a single programmer in a 10k line codebase, that’s no laughing matter) and there’s a lot of complexity. Across system boundaries, there’s even more complex but its hidden. We can’t possibly learn about it, except through stumbling into it. So inside a project, the unknown unknowns should be kept to a minimum, but for a library? Having unknown unknowns is a huge issue. To solve this, we should expect library-writers to do what other product designers do. Test every aspect, and eliminate the outward facing unknowns, the things that morph, because of our limited “Memory RAM” into unknown unknowns.

What I mean is, if there’s a set of behaviors that everyone is likely to need, you’re obviously going to implement those. But for every problem, there usually seems to be a set of behaviors that are going to be needed by some people, that doesn’t interfere or change the base behaviors. If those extra behaviors are ones that the less-needing users could expect the software to have, and the more-needing users are almost certainly going to expect the software to have, implement those features. To use a new example, because you’re surely getting tired of the same two, imagine we had a dishwasher. Most people only use the basic features: put in the dishes, turn on the dishwasher, wait, take them out. But if the dishwasher provides a feature that lets you change the length of the cycle based on weight, that’s something that a lot of power- and environmentally conscious people are going to use. Would you leave it so that the done-chime only goes off at the normal length of time, so if the dishwasher finishes earlier or later, the chime is wrong? Of course not! Sure, not a lot of users will see that the chime is wrong, but those that do will care, and it doesn’t change the use of the technology for the lesser-needing users. Most products are designed this way, even user-facing software. Not a lot of developer-facing code is written this way. If you go off the beaten path, you will find dragons. Even if its only a little.

Before I close, I want to acknowledge that designing libraries like this is hard. Treating them like a product and going through what the user expects, and what the user will use, and all the code paths, that’s hard. It’s why products take a long time. In my example, it wasn’t a big deal. But in the bigger software industry, I wonder what we’d gain by doing this? By making our libraries and frameworks work as expected, and have well rounded, complete and simple exteriors?

Leave a Reply

Your email address will not be published. Required fields are marked *