Saturday, March 28, 2009

Decomposing genWABlockGroups() – Part 1

In my last post I mentioned that a place to start is to check out the genWABlockGroups() function. Today I am going to walk through first couple of lines to explain what it does to begin to unwind the mystery.

This function was written to create maps based on Census Block Groups. Specifically, I built it to map the median home values in Washington state from the 2000 Census. Today I’m going to review the first couple of lines and input files:

baseDir = "/Users/aidan/Desktop/"
projectDir = "Census 2000 WA BGs/"
polygonShapeFile = baseDir + projectDir + "bg53_d00.dat"
polygonMetaDataFile = baseDir + projectDir + "bg53_d00a.dat"
dataFile = baseDir + projectDir + "Median HH Value.txt"
outputFile = baseDir + projectDir + "output"

The first set of lines are all about setting up the input files and output files. In general, there are three input files required.

  1. You need a “shape” file. I talk quite a bit about the types of data sources I use in this post: http://censuskml.blogspot.com/2007/03/get-your-real-live-examples.html. The files I use come straight from the Census Cartographic Boundary files website: http://www.census.gov/geo/www/cob/bdy_files.html. Here is a sample of what a shape file looks like:

    1 -0.122233088267882E+03 0.489843885617284E+02
    -0.122285651792493E+03 0.490024370686774E+02
    -0.122251693540713E+03 0.490024929621633E+02
    -0.122251621999248E+03 0.490024930799168E+02
    ...
    -0.122285651792493E+03 0.490024370686774E+02
    END


    The structure is pretty simple. The shape's unique identifier within the file comes first, "1" in the above example. The coordinates that immediately follow this are the center point for the shape. Then the points that dictate the boundary points are separated by a line break after the first one, ending with the word "END". You’ll note that the first of the boundary points matches the last one exactly. The shapes are always 100% complete. A single Census cartographic component came be made up of multiple shapes (for example, an chain of islands that are in one county will be made up of distinct shapes). The “metadata” file is crucial to piecing all of this together.
  2. You need a shape “metadata” file. Again, this comes straight from the same Census website. Here is what the first few lines of the file look like:

    0
    " "
    " "
    " "
    " "
    " "
    " "
    " "

    1
    "53"
    "073"
    "0102"
    "2"
    "2"
    "BG"
    " "
    ...

    It can get hard to decipher these files. At a basic level this maps a shape to a specific Census area, in this case a Block Group. The metadata starts with the unique shape ID. Then there are a series of identifiers in quotation marks: (a) “53” = the State ID of Washington, (b) “073” = the county identifier within WA, (c) “0102” = tract within the county, (d) “2” = “the block group within the tract”, (e) “2” = the LSAD identifier with is kind of a numeric identifier that marks this as a Block Group, (f) “BG” = the translation of the previous LSAD code, and finally (g) the last blank is an area to put miscellaneous data. As I mention in the description of the previous file, there can be a one-to-many relationship between the shapes and actual Census area – multiple shapes can have the exact same metadata. Check out the BlockGroup.rb file to see how to programmatically read this data.
  3. You need a data file. The data file needs a way to map back to the “shapes” that are loaded. The “key” must match the shapes unique identifier. Again, I try and use files that are as straight from the Census website as possible. Typically I pull the data down from Tiger or FactFinder. Again, another sample:

    150,53,001,950100,1,530019501001,68800,88800,115900
    150,53,001,950100,2,530019501002,63300,78800,97800
    150,53,001,950100,3,530019501003,41300,58600,81600

    This is a CSV where the 6th column (if are are counting 1-based) contains the unique identifier for the Census geographic area and the three columns after it contain data about that specific area. You'll note how the identifier is a combination of the data from the metadata file. It took me a while to figure out how to construct the unique identifier, but I cracked the code. In the code, each shape object file has a function to build up it’s own unique ID that will map to the way Census spits out data with respect to that area.

Next time I will go through the actual function calls that load in these files.

Tuesday, March 24, 2009

Get some code

I spent a made two months really pouring lots of time and energy into code to produce the visualizaiton I have shared in this blog. In the process I learned a bunch about Ruby, KML, and Census data - and had great fun.

That was almost two years ago and I've consistently made promises to myself to get back on the wagon and keep working on the project. However, these just have not materialized.

Because of this, I've decided to open up the code, in its raw state, for public consumption. It is full of bad programming practices, from too many files to hard coded strings it would get an F in almost any code review. This prevented me from sharing it for a long time, but as the months kept ticking by and I didn't make any progress, I realized that either I was going to get it out there or it would simply die as a small blog.

Over the coming posts I'll share the structure in lots more detail, but to start, I wanted to share what the most important files are.

Let me give you a small tour around the code:

  • GenPlacesKml.rb : This is the main file. This is what you "run". I've created many different "main" routines to create the outputs. This file also contains all of the code to create the actual KML document. I never created an object model for the KML components, I simply created functions that generated the components as rexml objects. This then gets dumped into the KML output file.
  • Polygon.rb : This contains the definition of the Polygon object, which is the core object that holds the Census shape date. This also contains the routine to load Census shape data from a file.
  • BlockGroups.rb, County.rb, etc. : These contain the objects to hold the Census meta data for each of the different types of geographies the Census creates. It also contains the method to load them.
  • Data.rb : This is a generic data loading set of functions.
  • Colors.rb : This generates the color palettes used in the files
  • FIPS.rb : This contains some data definitions to do lookups on Census abbreviations.


If you want to start playing around with the code, the easiest "main" routine to look at is "genWABlockGroups()".

Now, where is the code? I've decided to use Google Code to share it & I've put it out under the GPL. This uses Subversion, which I am completely unfamiliar with, to store and share the code. It does seem like a powerful system, but I'm just learning how to use it appropriately. I am very open to any and all feedback on its proper setup and use.

I'm hoping the community is interested in breathing life back into the code. In the coming posts I'll walk through the structure (to the extent that it exists).

You can find the source at http://code.google.com/p/rubykml/

Sunday, March 15, 2009

Hmm... it has been a while

It has been far too long since I've actively driven this project. As such, I'm seeking ways to share the source code I've generated. I'm exploring Google Code and will report back once that is good a setup.

Thanks for staying tuned and I hope opening up the code will spark new life into the work.

Wednesday, July 25, 2007

Real Progress - 5 digit ZCTAs

It has been 4 months since real progress was made - but tonight represented a real break through. I got the code up and running on my new laptop and transitioned over to using Eclipse, a great IDE that has integrated debugging of Ruby. But talking about how I am developing is not the purpose of this post. The purpose is to breath life into the project once more.

I've gotten many e-mails over the past few months asking for help in creating maps. One person in particular had a very specific ask that I thought would both expand the functionality of the code & test a new data source. For this set of maps, I needed to start mapping 5-digit zip codes (Zip Code Tabulation Areas to the Census), my first foray into zip codes in this work.

As proof of life, enjoy the screenshot below which shows all of the zip codes for the state of Maine, colored with random colors.



Now that the code is running again and I've got some specific projects to get going again, I look forward to getting back to a much more regular schedule. Please keep up the contact.

Wednesday, May 23, 2007

Congressional District maps for every state!

It has been almost six weeks since I've had something substantive to add. I wanted to share the complete outputs from the campaign contribution work I spoke about earlier. Please read that post for a full explanation of what the data is and how I put the files together.

The following two ZIP files contain a KMZ for every single state. I used the TimeSpan element, all of the states can be loaded at once.

Without further ado:


If you have any problems with these files, please let me know.

I hope to be back up and running soon, so please stay tuned.

Thursday, May 10, 2007

New blog on the block

There is a new blog at Google published by the Google Earth and Maps team: Lat Long Blog. I've begun spinning up my efforts to produce more maps and code and hope to share some new outputs soon. Thanks for staying connected during this intermission.

Sunday, April 22, 2007

A bit of a break...

I apologize for not updating the blog for the last week - I've been silent because I have just started a new job and haven't had the time to come back to the work here yet. I do plan to come back to it, however at a slower pace. In the mean time, do not hesitate to reach out for help with similar efforts by either using the comment system or dropping me an e-mail at censuskml at gmail.