Wednesday, March 14, 2007

How hard can it be to generate KMZ files?

One of the items I have had on my to do list for the past few days is to change the output of my program to KMZs from KMLs. At high-levels of geographic complexity (such as Block Groups or County Subdivisions) the KML outputs can be very large (5 to 10 megabytes per state). KMZs are simply KMLs, but zipped. Everything I've tried to do in Ruby has turned out to be so easy, I figured this would be as well. I was wrong!

I started by trying to use the built in zlib module that appears to be a part of the standard Ruby install. I first tried the following bit of code:

outputFile = File.new(outputFilename,"w+")
zipWriter = GzipWriter.new(outputFile)
xml.write( zipWriter )
zipWriter.close

I was thinking to myself that this was just too easy, and it was. The output file came out as a KMZ file which Google Earth promptly refused to open. I kept getting some sort of bizarre error like "unexpected token at line 1, column 0". What got interesting is that if I just changed the file's extension to .zip and unzipped it using the built in Archive tool (on Mac OS X), the resulting KML file worked just fine - no problems!

Next, I tried writing the XML to a string which I then wrote to a zipped file, in case the streams weren't playing well together.

xmlString = StringIO.open("", "w+")
xml.write( xmlString, 0 )
zipWriter = GzipWriter.new(outputFile)
xmlString.rewind
zipWriter.write( xmlString.read )
xmlString.close
zipWriter.close

This code produced the exact same output and the exact same error in Google Earth.

Getting frustrated, I did some online searching. I found out that the KMZ spec says that the main KML in the KMZ should be named doc.xml. So, I tried writing out the KML file and then zipping that into another file:

xml.write( File.new("doc.kml","w+"), 0 )

GzipWriter.open("doc.kmz") do |gz|
  gz.orig_name = "doc.kml"
  gz.write(File.read("doc.kml"))
  gz.close
end

Same error! At each iteration I could change KMZ to ZIP, unarchive the file and open the resulting KML. I could even take a KML file, zip it, change the extension to .KMZ and open it up in Google Earth, regardless of the original filename (no "doc.kml" requirement seemed needed)! What was going on?

Next, I tried using the system gzip command, thinking maybe there was a problem with the Zlib module:

xml.write( File.new("doc.kml","w+"), 0 )
system("gzip", "doc.xml". "-S .kmz")

Again, same error!

At this point, I figured that all ZIP algorithms must not have been created equal, so I went searching for another ZIP Ruby module. I found one called rubyzip. I was encouraged by this new module because it had a lot more functionality than zlib and a lot better documentation. After giving up on using RubyGems to install the darned thing, I simple downloaded the code and installed it myself. A few sweet moments later, I held my breadth and ran the following code:

ZipFile.open( outputFilename + ".kmz", ZipFile::CREATE) {|zipfile|
  zipfile.get_output_stream("doc.kml") {|file| xml.write(file, 0)}
  }

It worked! I wish I had a better conclusion as to exactly what was going wrong with zlib, but the moral of the story is if you are trying to create KMZ files in Ruby, use rubyzip!

2 comments:

Imran Haque said...

The issue is that gzip and zip are not the same file format (although they do both use the 'deflate' algorithm). The zip format allows multiple files in one zip archive, and therefore includes file listing information (essentially, a directory tree). gzip, in the finest Unix tradition, simply works on a plain stream of data and assumes it's a single file - that's why it's often combined with tar, to let tar handle the file structure and gzip the compression. The Mac Archive utility probably ignored the file extension and noticed that your data was simply gzipped KML (as opposed to a KML file in a zip archive), which is why the extracted file worked fine.

The KMZ format depends on zip's ability to store multiple files because a single KMZ archive may need to hold multiple pieces of data. As a simple example, the KMZ files generated by gCensus have two files - the KML map and the PNG overlay for the legend.

Sidenote: I don't know if this applies to the Ruby implementation of zip, but I found that I got much faster performance out of Perl's Archive::Zip if I increased the chunk size used in compression to around 128K. If your zip code is running slowly, it may be worth looking for such a tweakable parameter in your module.

aidan said...

Thanks for the answer! Your explanation makes sense.