*Part of the table at* *Wikipedia*

I immediately wondered if it would be possible to identify some (preferably small) portion of text by its language, solely by looking at the letter frequencies. Obviously there are far more elaborated methods like word recognition etc., but the approach with letter frequency has two advantages:

- no language knowledge necessary (e.g. dictionaries or grammar rules) except for letter frequencies
- simple to implement: little code and measurable results

(Code is on Github)

The basic idea is to first analyze the text and then calculate the mean squared error for each language, something like this:

- loop through text and calculate relative letter frequency (a: 3.3%, b: 2.1%, ….)
- for each language:
- load letter frequency for language
- compare frequency of each letter to frequency in text and add difference^2 to score for that language

- sort and output score for each language

Below are some text examples, and the mean squared error for each language (less is better):

In de prachtige zeestad Genua, de trotsche bijgenaamd, werd omstreeks het jaar 1435 een knaapje geboren, dat nu in alle landen als Christophorus Columbus bekend is. (…)

Solone, il cui petto un umano tempio di divina sapienzia fu reputato, e le cui sacratissime leggi sono ancora alli presenti uomini chiara testimonianza dell’antica giustizia, era, (…)

Jukolan talo, eteläisessä Hämeessä, seisoo erään mäen pohjaisella rinteellä, liki Toukolan kylää. Sen läheisin ympäristö on kivinen tanner, mutta alempana alkaa pellot, joissa, (…)

The three snippets provided are from Project Gutenberg, each one contains only the first 200 lines of the original book.

Interestingly enough, we also see which languages are similar to each other: For the italian text, spanish, french, esperanto and portuguese pop up first:

- Italian: MSE 0.0007 | 1503.6 points
- Spanish: MSE 0.0042 | 237.0 points
- French: MSE 0.0057 | 175.9 points
- Esperanto: MSE 0.0068 | 147.4 points
- Portuguese: MSE 0.0072 | 139.4 points

… while for the dutch text it’s german and danish:

- Dutch: MSE 0.0016 | 639.8 points
- German: MSE 0.006 | 166.3 points
- Danish: MSE 0.0076 | 131.6 points

We can also see that Finnish is quite an outlier, with only Esperanto somewhere near it, and all other languages are quit far away.

With a simple Python script and the table from Wikipedia we get some quite good results on our example texts! It is, however, quite a small subset of languages we’re checking here, so maybe a bigger set would reveal some more results.

Furthermore we also get some “relative” results how the languages stand to each other and how close they match the occurring frequencies of the example text.

Thanks for reading!

]]>**TLDR**; Ever wondered how much a ship costs in No Man’s Sky? Here’s how much:

Inventory slots:

Average price:

Estimated gold mining time:

After playing No Man’s Sky for some time, I noticed two things about ships and slots: First, the cost of the ships available always seems to be something random around a fixed value. For a 31 slots ship, sometimes the cost is around 11.6 million, sometimes 12.3 or even 12.5.

Second, the increase per additional slot (ratio slot-price) is not linear, but instead it seems to be some kind of exponential progress.

I wondered what the real relation is between price cost and slots, so I started to take some notes. You can have a look at the 72 data points here:

(The cost is in million and rounded to a tenth of a million.)

At first, I thought the ratio from slots to price would exponential, something in the form like

(y = cost of ship, x = amount of slots) But the line is too steep for the higher slots. After playing around a bit I realized that it would be more something like the following

Here are both of the lines for comparison:

As you can see, the both rightmost points don’t really fit the first equation, whereas the second equation is a much better fit. If we look closely at the middle part between 30 and 35 slots, there the second equation also is a slightly better fit.

To fiddle out the exact parameters, I wrote a simple python script to optimize the exponent:

```
def mse(factor):
return sum([(0.000001 * entry[0]**factor - entry[1])**2 for entry in data])/len(data)
with open('data.csv', 'r') as csv_file:
content = csv_file.readlines()
data = []
for line in content:
data.append([float(line.split(',')[0]), float(line.split(',')[1].strip())])
round = 0
factor = 1.0
diff = 1.0
best_mse = 1000
best_factor = 0
while round < 100:
factor += diff
print("MSE with ", factor, ":", mse(factor))
# if new best, remember current factor
if mse(factor) < best_mse:
best_mse = mse(factor)
best_factor = factor
else:
factor = best_factor
diff = diff/2
round += 1
```

This gives the following output (to the left is the exponent, on the right the mean squared error ):

Bingo! So the formula (price in million) is:

Or in other words, for the full price, the formula is really simple:

As you can see above I also included an approximate mining time, which is based on an optimistic mining rate, for one of those planets which is full of gold piles.

For a full slot of gold, 250 units, I used 2 minutes and 30 seconds. A full slot of gold gives 55’000 units of gold, so that’s 22’000 units per minute.

Happy mining! :-)

]]>Often the games were released with a installer like one of these:

Amongst .NFO-file and the game, splitted into 50 or more rar archives, there was also music included – chiptunes – and sometimes I let the installer run just to listen to the music.

Check out this player with my favorite chiptunes (just click the song titles to play them):

(nichts geladen)

It’s powered by Chiptune2.js (which again uses libopenmpt and enscripten) to play MOD files directly in your browser, you can download all of the songs above and some extra ones here.

Module Files originate from old Amiga MOD files and are nowadays used in a format like .mod, or .xm. The cool thing is, due to their structure, they are **only about 15 KB** each! Most songs here are from Maktone/Martin Nordell, a well-known name in the cracktro scene, he’s contributed to many of those releases. (Here’s an interview from 2002 in which he shares some detail about himself.)

To play them, you’ll need either a module file player like Open Cubic Player or of course, also VLC can play them. There are also tools to compose and edit them, like Milkytracker. MOD files typically consist of 4 channels (standard .mod-files), for example Class01.mod:

You can even see tone heights (like C-4 and G-3), and that the player repeats a certain set of sequences. If you’d like to learn more about composing chiptunes, have a look here.

By the way, if you like that kind of music, be sure to check out the game VVVVVV (Steam link), which has an awesome chiptune soundtrack (named PPPPPP):

(It’s fun, but be prepared, the game’s so difficult there’s a Steam achievement for finishing it with less than 500 deaths :-)

Enjoy, and thanks for reading!

]]>I made a short puzzle game using ProcessingJS where you push around blocks. It has only like 15 levels but I felt I’m not going to work on it any further so I finished what was there. Code is on Github.

Enjoy!

]]>Oftmals werden Arbeitszeugnisse nicht selbst geschrieben, sondern “generiert” (zusammengeklickt, durch Auswahl erstellt, zusammenkopiert) anhand fixer Textbausteine. Das verringert einerseits den Aufwand der Erstellung, stellt aber auch sicher dass man sich im gesetzlichen Rahmen bewegt, da nicht alles im Arbeitszeugnis rechtlich zulässig ist.

Oft erkennt man Formulierungen, die aus Textbausteinen erstellt wurden, relativ schnell:

*“…hervorragendes und umfassendes Fachwissen…”**“…sehr motiviert und zeigte ein hohes Mass an Initiative und Leistungsbereitschaft…”**“…erwies sich Herr X als belastbarer Mitarbeiter und ging überlegt, ruhig und zielorientiert vor…”*

Inspiriert von Arbeitszeugnis.ch und anderen Quellen (Arbeitszeugnisgenerator, LSO, mediaintown.de) ergibt sich eine kleine Sammlung von solchen Textbausteinen, die wiederum “zurückübersetzt” werden können, als Beispiel :

- “stets zu unserer vollsten Zufriedenheit”: sehr gut
- “stets zu unserer vollen Zufriedenheit”: gut
- “im Großen und Ganzen zu unserer Zufriedenheit erledigt”: mangelhaft
- “bemühte sich, die ihm übertragenen Aufgaben zu unserer Zufriedenheit zu erledigen”: ungenügende Leistung

Die ausgewählten Textbausteine sind keinesfalls vollständig oder zwingend korrekt benotet (diese kann stark variieren), aber die Analyse soll durchaus dazu anregen, bestimmte Stellen kritisch zu hinterfragen.

Probier es hier selbst aus: Arbeitszeugnis-Analyse

]]>