Calculating MD5 Checksum of Ubuntu 7.10 ISO Image in Python

2007-10-19 at 02:39 | Posted in devel, fun, lang:en | 4 Comments
Tags: , , , ,

I like the Python programming language and use it on a daily basis. To give you an example of such a usage, I’m going to calculate the MD5 checksum of a fresh Ubuntu 7.10 ISO image I’ve downloaded from a mirror site. It is needed in order to make sure that no accidental or malicious modifications have been made by the mirror maintainers or computer crackers. Another (standard) way of performing such a task is installing and using some kind of an MD5 software tool on your computer. But it’s no fun to do that, so let’s program this task in Python :)

This easiest way of calculating the checksum in Python is:

from md5 import md5
fname = "ubuntu-7.10-desktop-i386.iso"
s = md5(open(fname, "rb").read()).hexdigest()
print "md5 checksum: %s" % s

It’s quite easy, isn’t it? :) But unfortunately it’s way too inefficient, because this code has to allocate all the ISO image in memory (700+ MB) while reading it from the file.

Let’s read the ISO image file by relatively small blocks updating the MD5 checksum after every read:

from md5 import md5
fname = "ubuntu-7.10-desktop-i386.iso"
block_size = 0x10000
def upd(m, data):
  m.update(data)
  return m
fd = open(fname, "rb")
try:
  contents = iter(lambda: fd.read(block_size), "")
  m = reduce(upd, contents, md5())
  print "md5 checksum: %s" % m.hexdigest()
finally:
  fd.close()

By the way, if you’re wondering why we can “update” the checksum, see the MD5 hash algorithm; briefly because MD5 operates on independent 512-bit chunks of data that are reduced using addition modulo 232. One more comment: the code looks a bit functional, but in fact there are lots of destructive updates here.

You can compare your results with an appropriate MD5 checksum at the Ubuntu homepage.

Upd: For those people who are just looking for the MD5 checksum value of ubuntu-7.10-desktop-i386.iso, the value is: d2334dbba7313e9abc8c7c072d2af09c.

4 Comments »

RSS feed for comments on this post. TrackBack URI

  1. I have to say, your update cracked me up, glad to know I’m not the only one;)

  2. @philiosophe: What do you mean by this? Do you mean that it’s very insecure to trust to a someone’s blog entry with unsigned checksum in order to verify integrity of a file? ;)

  3. Thank you for your interesting demonstration.

  4. The second method was very clever, I have already used it in a few projects.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.
Entries and comments feeds.

%d bloggers like this: