Saturday, March 28, 2009

Decomposing genWABlockGroups() – Part 1

In my last post I mentioned that a place to start is to check out the genWABlockGroups() function. Today I am going to walk through first couple of lines to explain what it does to begin to unwind the mystery.

This function was written to create maps based on Census Block Groups. Specifically, I built it to map the median home values in Washington state from the 2000 Census. Today I’m going to review the first couple of lines and input files:

baseDir = "/Users/aidan/Desktop/"
projectDir = "Census 2000 WA BGs/"
polygonShapeFile = baseDir + projectDir + "bg53_d00.dat"
polygonMetaDataFile = baseDir + projectDir + "bg53_d00a.dat"
dataFile = baseDir + projectDir + "Median HH Value.txt"
outputFile = baseDir + projectDir + "output"

The first set of lines are all about setting up the input files and output files. In general, there are three input files required.

  1. You need a “shape” file. I talk quite a bit about the types of data sources I use in this post: http://censuskml.blogspot.com/2007/03/get-your-real-live-examples.html. The files I use come straight from the Census Cartographic Boundary files website: http://www.census.gov/geo/www/cob/bdy_files.html. Here is a sample of what a shape file looks like:

    1 -0.122233088267882E+03 0.489843885617284E+02
    -0.122285651792493E+03 0.490024370686774E+02
    -0.122251693540713E+03 0.490024929621633E+02
    -0.122251621999248E+03 0.490024930799168E+02
    ...
    -0.122285651792493E+03 0.490024370686774E+02
    END


    The structure is pretty simple. The shape's unique identifier within the file comes first, "1" in the above example. The coordinates that immediately follow this are the center point for the shape. Then the points that dictate the boundary points are separated by a line break after the first one, ending with the word "END". You’ll note that the first of the boundary points matches the last one exactly. The shapes are always 100% complete. A single Census cartographic component came be made up of multiple shapes (for example, an chain of islands that are in one county will be made up of distinct shapes). The “metadata” file is crucial to piecing all of this together.
  2. You need a shape “metadata” file. Again, this comes straight from the same Census website. Here is what the first few lines of the file look like:

    0
    " "
    " "
    " "
    " "
    " "
    " "
    " "

    1
    "53"
    "073"
    "0102"
    "2"
    "2"
    "BG"
    " "
    ...

    It can get hard to decipher these files. At a basic level this maps a shape to a specific Census area, in this case a Block Group. The metadata starts with the unique shape ID. Then there are a series of identifiers in quotation marks: (a) “53” = the State ID of Washington, (b) “073” = the county identifier within WA, (c) “0102” = tract within the county, (d) “2” = “the block group within the tract”, (e) “2” = the LSAD identifier with is kind of a numeric identifier that marks this as a Block Group, (f) “BG” = the translation of the previous LSAD code, and finally (g) the last blank is an area to put miscellaneous data. As I mention in the description of the previous file, there can be a one-to-many relationship between the shapes and actual Census area – multiple shapes can have the exact same metadata. Check out the BlockGroup.rb file to see how to programmatically read this data.
  3. You need a data file. The data file needs a way to map back to the “shapes” that are loaded. The “key” must match the shapes unique identifier. Again, I try and use files that are as straight from the Census website as possible. Typically I pull the data down from Tiger or FactFinder. Again, another sample:

    150,53,001,950100,1,530019501001,68800,88800,115900
    150,53,001,950100,2,530019501002,63300,78800,97800
    150,53,001,950100,3,530019501003,41300,58600,81600

    This is a CSV where the 6th column (if are are counting 1-based) contains the unique identifier for the Census geographic area and the three columns after it contain data about that specific area. You'll note how the identifier is a combination of the data from the metadata file. It took me a while to figure out how to construct the unique identifier, but I cracked the code. In the code, each shape object file has a function to build up it’s own unique ID that will map to the way Census spits out data with respect to that area.

Next time I will go through the actual function calls that load in these files.

Tuesday, March 24, 2009

Get some code

I spent a made two months really pouring lots of time and energy into code to produce the visualizaiton I have shared in this blog. In the process I learned a bunch about Ruby, KML, and Census data - and had great fun.

That was almost two years ago and I've consistently made promises to myself to get back on the wagon and keep working on the project. However, these just have not materialized.

Because of this, I've decided to open up the code, in its raw state, for public consumption. It is full of bad programming practices, from too many files to hard coded strings it would get an F in almost any code review. This prevented me from sharing it for a long time, but as the months kept ticking by and I didn't make any progress, I realized that either I was going to get it out there or it would simply die as a small blog.

Over the coming posts I'll share the structure in lots more detail, but to start, I wanted to share what the most important files are.

Let me give you a small tour around the code:

  • GenPlacesKml.rb : This is the main file. This is what you "run". I've created many different "main" routines to create the outputs. This file also contains all of the code to create the actual KML document. I never created an object model for the KML components, I simply created functions that generated the components as rexml objects. This then gets dumped into the KML output file.
  • Polygon.rb : This contains the definition of the Polygon object, which is the core object that holds the Census shape date. This also contains the routine to load Census shape data from a file.
  • BlockGroups.rb, County.rb, etc. : These contain the objects to hold the Census meta data for each of the different types of geographies the Census creates. It also contains the method to load them.
  • Data.rb : This is a generic data loading set of functions.
  • Colors.rb : This generates the color palettes used in the files
  • FIPS.rb : This contains some data definitions to do lookups on Census abbreviations.


If you want to start playing around with the code, the easiest "main" routine to look at is "genWABlockGroups()".

Now, where is the code? I've decided to use Google Code to share it & I've put it out under the GPL. This uses Subversion, which I am completely unfamiliar with, to store and share the code. It does seem like a powerful system, but I'm just learning how to use it appropriately. I am very open to any and all feedback on its proper setup and use.

I'm hoping the community is interested in breathing life back into the code. In the coming posts I'll walk through the structure (to the extent that it exists).

You can find the source at http://code.google.com/p/rubykml/

Sunday, March 15, 2009

Hmm... it has been a while

It has been far too long since I've actively driven this project. As such, I'm seeking ways to share the source code I've generated. I'm exploring Google Code and will report back once that is good a setup.

Thanks for staying tuned and I hope opening up the code will spark new life into the work.