GeoIp – Binary DB vs MySQL Lookups

Recently I had to implement GeoIP lookups for a client project to determine which country user’s were coming from in order to supply them locale specific content. That, in itself, is not very interesting. What really prompted sufficient interest to write this post was the difference in look up performance between the two available formats for this database.

MaxMind make their GeoIP DB available in two formats; a CSV for use in other databases systems, and a proprietary binary database file for use with provided API scripts.

I had dutifully written a quick import routine to update a MySQL table with the latest CSV data and was quite happily running queries against the table without any major issue.

But the binary format kept calling to me.

Firstly it was a single file to download in place of the current, download, update db fiasco. Secondly MaxMind were claiming crazy throughput for queries against the binary DB and whilst I was quite happy with the performance of the MySQL version I’ve never been one to overlook a potential optimisation.

After grabbing the Java API (MaxMind suggest the COM API for ColdFusion but I’d rather stick with pure Java if possible to maintain cross platform capabilities) and compiling the relevant bits it took me about 10 minutes to work out the object calls and casts needed to get everything up and running.

For those who are interested in the specifics, the code is below:

[cf]
<cfset variables.myBinFile = javacast("string", expandPath("pathtodataGeoIP-106_20101026.dat")) />
<cfset variables.geoIpObj = createObject("Java","com.maxmind.geoip.LookupService").init(variables.myBinFile) />
<cfset variables.myLookup = variables.geoIpObj.getCountry(javacast("string","8.8.8.8")) />
[/cf]

I then created a quick test script that would run 1,000 IP address queries (gained from grepping against our mail relay’s log files) through a MySQL lookup and then the same 1,000 though the Binary DB lookup.

Just for giggles (and because CF’s object creation is allegedly a performance nightmare)  I also though I’d try each test with the object instantiated once at the beginning of the run and then again with one instantiation per lookup.

The results … were astounding!

1,000 Queries with single object creation
MySQL – 121934ms (121.93s)
Binary -125ms (0.13s)
Ratio of 0.00102514475044

1,000 Queries with looped object creation
MySQL – 122324ms (122.32s)
Binary – 375ms (0.38s)
Ratio of 0.00306562898532

As you can see the Binary file performs 2 orders of magnitude more efficiently than the MySQL lookup (and I was querying against my local MySQL server so no network latency issues to speak of) and even throwing object creation into the loop only added 250ms (which isn’t too shabby considering that meant creating the same object 1,000 times!)

Suffice to say I have now rewritten the lookups to use the Binary DB format for super fast queries, less DB load and much simpler updates!

(Note: reading through the notes for the Java API it is also possible to employ a caching mechanism within the Java Lookup method which, combined with the singleton pattern, would result in event faster lookups for repeated queries! But I couldn’t work out how to pass in the required caching object via the CF / Java bridge so I’ll leave that to people more clever than I)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s