Converting Roman numerals

October 8, 2012

Earlier this week I stumbled upon a problem involving the conversion of roman numerals (Project Euler). A short and elegant way to do it is as described by Tim Valenta on ActiveState Code. He begins by defining a mapping for numbers to roman signs:

numeral_map = zip(
    (1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1),
    ('M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I')
)

You can see at a glance that not the characters on their own will be recognized (I, V, X, L, C, D, M), but also the pairs which consist of a smaller number before a bigger one.

In fact it's a little trick to treat smaller letters before bigger ones as separate signs, for example 'IX' (= 10-1 = 9), so we don't have to deal with the whole subtraction thing. It is also possible to build a Parser-like converter which only depends on single letters, but they tend to be much more complicated. But if you do come up with or find a short, elegant program that does exactly that, be sure to let me know! :)

Back to the code. For converting numbers to Roman numerals, we just loop (one time!) through the map and insert the according letters.

def int_to_roman(i):
    result = []
    for integer, numeral in numeral_map:
        count = int(i / integer)
        result.append(numeral * count)
        i -= integer * count
    return ''.join(result)

For example, if we want to convert the number 92, the variable count will be 0 for the first few values, because 92 divided by 100, 900, ... will be zero. Then, at 90, count will be one, thus one 'XL' will be appended to the list result. The loop continues until 'I', when count will be two, and 2 * 'I' = 'II' is added to the list. At last the list is concatenated into a string.

The other way round it does pretty much the same, except there is a nested while loop which just keeps adding to the final result as long as it finds the same Numeral over again.

So in fact you might be able to parse something like 'XIIII' which isn't a proper expression, because it has more than 3 consecutive 'I's (it might have been in ancient Rome, where the rules weren't as strict as they are today). 'XIIII' will be evaluated ("correctly") to 14.

def roman_to_int(n):
    n = unicode(n).upper()
    i = result = 0
    for integer, numeral in numeral_map:
        while n[i:i + len(numeral)] == numeral:
            result += integer
            i += len(numeral)
    return result

What fascinated me most about this approach is that with use of a little trick, the conversion is really elegant (and short!), no special cases to handle, just parse linearly through the whole given word (when converting to "normal" numbers).

Code

Here's the code again. Be aware: To keep the code simple for this post there's no exception handling or input verification. (A string like 'XVIIF' will result in a exception and program termination.)

numeral_map = zip(
    (1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1),
    ('M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I')
)

def int_to_roman(i):
    result = []
    for integer, numeral in numeral_map:
        count = int(i / integer)
        result.append(numeral * count)
        i -= integer * count
    return ''.join(result)

def roman_to_int(n):
    n = unicode(n).upper()

    i = result = 0
    for integer, numeral in numeral_map:
        while n[i:i + len(numeral)] == numeral:
            result += integer
            i += len(numeral)
    return result

Thanks for reading!