Drawing Blanks

Premature Optimization is a Prerequisite for Success

Applied group theory

with 2 comments

Imagine a simple image editor program that runs under strict resource constraints – on a mobile phone. (A friend of mine has been writing one). The user edits the picture in the preview mode, and after they are done, they click “apply”, and the program applies all the edits to the actual full-resolution image. The same approach is used in MS Photo Gallery (even though the Gallery is not resource-constrained, but rather resource-demanding).

Among other things, the app allows to perform the usual set of rotate/flip transforms. Namely, rotate by 90, 180, 270 degrees, flip horizontally and flip vertically. Obviously, if the user has performed a series of such transforms, you don’t want to apply them one-by-one. If they’ve made two 90 degree rotations, you want to apply them as one 180 degree rotation, and if they’ve done 2 horizontal flips you don’t want to do anything. So the problem is clear: how to simplify a series of rotations and flips? It’s obvious how to simplify rotations: you simply add the degrees modulo 360. Flips are even easier. But if you mix them, it becomes harder. Mirrors don’t commute with rotations. E.g. a 90 degree rotation followed by a vertical flip is not the same as a vertical flip followed by a 90 degree rotation.

My first thought was “No matter which group this is,  we can represent these transforms by matrices”. Those would be really nice matrices that contain only 0s, 1s and –1s. You multiply them and obtain the resulting transformation. There are two problems with this. First, because of resource constrains, the program does not perform these transforms as coordinate transforms. It works by pushing arrays of bytes, or something like that. So knowing the matrix is not enough. We would need to know how to decompose that matrix into well-known fast transforms, namely rotations by multiples of Pi/2 and vertical/horizontal mirrors. And second, we don’t really need matrices to solve this problem.

Let’s see how many different transforms we can possibly get. The X axis can point in any of the 4 directions. For each position of the X axis, there are exactly 2 positions of the Y axis. Which gives us the total of 8 transformations. Yes, it’s a group of 8 elements, which is known as the group of symmetries of the square or D4. In addition to the 6 elements that I already mentioned, it also contains 2 diagonal flips. Those flips can be represented as a 90/270 degree rotate followed by a vertical flip. So all we need is to store the multiplication table for this group.

As a matter of fact, performance wise, the matrix approach is not worse that the multiplication table approach. I don’t really care to recommend one over the other in this particular case. What’s interesting is that this problem can be viewed as finding a representation of the group that has the least computational complexity, in some sense. A multiplication table of order n can be stored in n^2*log(n) bits. And that will allow to perform multiplication in O(1) time. In the particular case of the D4 group the table would require 192 bits. And the matrix representation would require to store 8 matrices each of which can be encoded in 8 bits. And multiplication will require some arithmetic (or bitwise manipulation), versus simple lookup.

I’m pretty sure that the problem of finding the “least complex” representation of the group has been studied, but I’ve never read anything on the subject.


Written by bbzippo

02/19/2010 at 6:50 am

Posted in math, programming

2 Responses

Subscribe to comments with RSS.

  1. This is an interesting idea. I’d be interested in how this algorithm performs compared to the classical ‘one-at-a-time’ approach.


    02/20/2010 at 4:36 am

    • Good question, luckytoilet. I think the practical benefit of this approach would depend on how exactly the transformation algorithms are implemented and how big our image is. If we are talking about moving around millions of bytes, then of course replacing the intermediate operations with something as simple as 4×4 matrix multiplication or 8×8 table lookup, would save quite a bit of time. Mirroring a 3 megapixel image on a Blackberry phone takes at least a second.
      BTW, you’ve got an interesting blog.


      02/20/2010 at 9:15 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: