Welcome to the personal website of Adrianus Kleemans.

You can navigate through older stuff on the left in the archive. Here are some interactive posts:

Or, check out the most recent posts below:


Comparing images

7 days ago

Some months back I stumbled across this revealing blog post from Silviu Tantos from iconfinder.com. It’s about how to compare two images and to quantify the difference into a single number, showing how much an image looks like another image.

For example: How much do these two images, a surprised koala and the same image but only with a fancy red hat, really look like one another? 80%? 90%? More?

And how yould you calculate such a difference without having to iterate over all the pixels?

I remember being intrigued back then when Google released its ‘search by image’ feature, by which I was equally impressed at the time. How was it possible to determine (with such accuracy!) if two images “look” like each other, or even to search for them?

The simple idea behind the whole algorithm described in the post really is fascinating and made it feel like being handed a well-kept secret :-) So I’d like to share some aspects of it accompanied by a simple Python script.

Scaling and removing colors

First of all, we want to compare all sorts of images, so also all kind of sizes. So we have to scale them down to a common size. If we choose a small size, calculation will be easier later on. For example 9×8 pixels (an uneven number in the linesize to make the dhash function work, see below).

Our two koalas will look like follows (enlarged for better visibility):

And after greyscaling, only the “intensity values” will stay, the R-G-B triplet will be reduced to one simple value. Technically, this step is done before, but for illustrating its effect, here is it now :-):

Not much of a difference now, eh?

Hashing: dhash

Now we have some scaled down pictures – but we need to transform this image data somehow into numeric values, suitable for fast comparison. Generating hashes comes to mind.

So, what hashing algorithm to choose? dhash, an algorithm which compares neighbor pixels if they get darker or lighter, seems so be very accurate and also fast.

To apply it, all our pixels (represented as “intensity values”) from our shrinked, grayscaled images will be compared to their right neighbor, so some kind of gradient will be established:

  • If the right neighbors is lighter (or the same), write a 0
  • If the right neighbor is darker, write 1

You can see that in the first row, the pixel first lightens up (0), then stays the same (0) and the fourth pixel in row is darker than the third => 1.

After this, we will end up with 8*8 values (for hash length 8) from which we can build a hex-hash. For the two koalas we end up with the following hashes:

= 2e75c5a3c7cd4d4e
= 2e67c5a3c7cd4d4e

You see straight here that they look nearly the same, and for an algorithm it is easy to compare bitwise how much of a difference there really is:

= 2 bits difference

So the difference of the two hashes is 2 out of 64 bits in total, which makes the second image 96.875% similar to the first one from this point of view :-)

I changed the dhash function (in comparison to the blog post mentioned above) here so that it displays exactly the given bits at the right position, which also simplifies the code.

00101110 | 2e
01110101 | 75
11000101 | c5

For example, as in the picture above the difference-map starts with 0010 1110, which is in hex “2e”, so the hex strings are a adequate representation which can be immediatly reproduced by looking at the image. On a bitwise level this doesn’t make any changes as long as all the hex strings are generated the same way.

Comparing koalas

As a sample, I compared some other koala images. Three are modifications from the original image: with an additional light source in the upper left corner, repainted as dots, a skew version, and one reference with a completely different koala.

koala_light koala_dots koala_skew other_koala

They scored as follows, value 0 as absolute no difference between the images, in respect to the “original” koala image from the beginning:

We see that the algorithm is robust against scaling or even to replotting with other techniques, and that the hash function is affected the most from lightning changes, for example re-highlighting the image with another light source. But even in this case the differences are reasonably small (5 bits difference).


The code checks for all .jpg-images in the current working directory, picks the first one (in alphabetical order) and compares it to the others. Hashes are calculated with dhash, differences between hashes with diff, and then a horizontal bar plot is drawn with use of matplotlib.

Python Code (Python 2.7, using PIL and matplotlib/pylab and some koala images :-)

# -*- coding: utf-8 -*-
'''Image comparison script with the help of PIL.'''
__author__  = "Adrianus Kleemans"
__date__    = "30.11.2014"
import os import math, operator from PIL import Image import pylab
def diff(h1, h2): return sum([bin(int(a, 16) ^ int(b, 16)).count('1') for a, b in zip(h1, h2)])
def dhash(image, hash_size = 8): # scaling and grayscaling image = image.convert('L').resize((hash_size + 1, hash_size), Image.ANTIALIAS) pixels = list(image.getdata())
# calculate differences diff_map = [] for row in range(hash_size): for col in range(hash_size): diff_map.append(image.getpixel((col, row)) > image.getpixel((col + 1, row))) # build hex string return hex(sum(2**i*b for i, b in enumerate(reversed(diff_map))))[2:-1]
def main(): # detect all pictures pictures = [] os.chdir(".") for f in os.listdir("."): if f.endswith('.jpg'): pictures.append(f)
# compare with first picture image1 = Image.open(pictures[0]) h1 = dhash(image1) print 'Checking picture', pictures[0], '(hash:', h1, ')'
data = [] xlabels = [] for j in range(1, len(pictures)): image2 = Image.open(pictures[j]) h2 = dhash(image2) print 'Hash of', pictures[j], 'is', h2 xlabels.append(pictures[j]) data.append(diff(h1, h2))
# plot results fig, ax = pylab.plt.subplots(facecolor='white') pos = pylab.arange(len(data))+.5
ax.set_xlabel('difference in bits') ax.set_title('Bitwise difference of picture hashes') barlist = pylab.plt.barh(pos, data, align='center', color='#E44424') pylab.yticks(pos, xlabels) pylab.grid(True) pylab.plt.show()
if __name__ == '__main__': main()



Dolphin Olympics

11 days ago

Last week I had a conversation with a friend about addicting games, and it reminded me of a a game I often used to play at school. I even made a video then, in which I wanted to reveal the “secrets of the pro’s” :-D
(I was 19 back then, so I guess that’s okay…)

But what I also realized is that it hit 50k views this week, hooray!

The game basically is a skater game where your goal is to score as much points as possible by gaining speed, do some crazy flips and tricks and trying to land savely. Just with a dolphin :-)

And to make it even more surrealistic, you can jump up to certain planets and stars, and do a “starslide”. There’s even a nice reference to the Restaurant at the end of the universe from Hitchhiker’s Guide to the Galaxy.

Here’s the video if you want to take a look:

You should give it a try at Rawkins Games.




Bye Wordpress

14 days ago

After some years of using Wordpress, I’m leaving as of today. I sometimes really liked writing in it, the choice of plugins and visual themes is amazing, and also the administrative interface and statistics were pretty.

On the other hand, writing in it always felt a bit constrained. I didn’t write a single post without glimpsing at just another code highlighting/importing/executing plugin, just to waste hours to find the perfect match, which often didn’t came along.
I also always felt like it was a bit bloated – not on the interface, which is quite nice and clean – but underneath, looking through themes to modify them, searching through PHP code. So many code and stuff in the background which I never fully saw through.

And one more thing: WP is pretty much only designed for blogging – writing posts, posting them on the front page and wait for them to vanish in the archive. Sure, you can categorize them, but what I want is some mix of fix, static pages, that can be really easy accessed and always stay updated, and not only articles which fade away. I also had the feeling, that “Pages”, the static sites of Wordpress, are always overshadowed by the main actors, the overmighty Posts: They can’t be shown on the front page by default, you have to create a link somewhere so they can eventually be found, and there’s no concept of maintaining a pool of (maybe hierarchical!) static pages which is getting bigger and wants to be presented in a neat way.

Nonetheless Wordpress is a great blogging engine, and I’m sure a big part of the people I know will be happily ever after with it.
It just – isn’t the right thing for me anymore.

In general, I want more freedom, more possibilities to hack around in the code. And without having to learn a whole PHP framework :-) I’m leaving for Textpattern, a more minimalistic approach of the whole blogging/writing thing.





697 days ago

I digged out a script which I wrote some months ago, for finding the best words in Boggle / Scramble / Ruzzle etc.
With ca. 210k words in a dictionary, each word is checked if it can be represented on the board. From all possible words, their values are calculated and then printed in descending order.

Sample grid:

My script:

Any multipliers (double/triple word/letter) are not taken into account.
But at least you should manage to get the “the ultimate move” achievement in Ruzzle

Try it yourself

Buchstaben eingeben:

(alle 16 Buchstaben aneinander, z.B. ABCDEFGHIJKLMNOP)




Bejeweled Bot

701 days ago

Some weeks ago I got really annoyed playing Bejeweled Blitz on Facebook.

I’m really bad at playing, so I decided to cheat and write a simple bot in Python which plays for me.
Here you can see it in action:

It basically consists of 3 parts (like every bot) which will then be repeated:

  • Analyze board
  • Calculate moves
  • Execute moves on board (by simulating user input)

Autopy was a great help for simulating user input (it’s like the robot-class from Java in Python). The bot’s really stupid and does every move it sees.

Repo (Github): bejeweled-bot.




« Older