Monday, April 19, 2010

SQL Server Geospatial Part 2 – The Geography Data Type

In the last post on SQL Server 2008’s spatial capabilities, I mentioned that there are two spatial data types available in SQL Server, geometry and geography. While the geometry type is easier to work with and has more methods available, we will need the geography type to perform “real” geospatial calculations. I mentioned a fantastic primer from Jason Follas on the two data types in my last post, I’ll link to it again here. Another good post on some of the issues converting between the two - and the idiosyncrasies of the geography type - is here.

For this example, we will be using the US Census data file for Illinois incorporated areas and an associated file with census population counts to determine the population density per square mile for cities in the Lincoln State. Download both files, and import the places file using Shape2SQL as we did with the zip codes file in the previous post. Make sure to import the file into the same database used in the last example (I’m using a database named GeoExamples); we will need both in our database for the walkthrough in the next post! Leave all options default again.

image

Now let’s import the census population count file we downloaded earlier (SUB-EST2008-IP.csv). Open SQL Server Management Studio, right-click your target database, and select “Tasks –> Import Data”. In the Import Data Wizard, click “Next” if this splash page is shown.

image

Now select Flat File as the data source, browse to the data file, and select the checkbox labeled “Column names in the first data row” as shown below.

image

Click “Advanced,” select the “Name” column, and change the length to 255.

image

Click “Next” and on the next pane select your target database.

image 

Click “Next” again and select a target table name. I will use ILPlacesCensusPops.

image

Click “Next”, leave all default options selected, and click “Finish”.

image

Click “Finish” again.

image

Now you’ve successfully imported the census population counts.

image

In SQL Server Management Studio, run the following query:

select places.NAME,places.geom,pops.POP_2008
from tl_2009_17_place as places
inner join ILPlacesCensusPops as pops
on pops.place = places.PLACEFP
where places.NAME like '%Chicago%'
and pops.STATENAME = 'Illinois'

You should see the following result set:

image

Now that we have all of our source data, it’s time to get down and dirty. If we want to determine the population density per square mile for our cities, we will first need to convert the geometry data type describing the shape of the city to the geography data type. The reason for this is that the geometry type describes shapes in a two-dimensional x-y coordinate system rather than in three-dimensional space. Distances and areas are described in “units” within this coordinate system rather than in kilometers or miles in “real” measurable space. If we want to perform any distance, area, grow, or shrink operations we’ll need the geography type so we can use real units of distance, as well as avoiding any distortion of our shapes.

However, as I mentioned earlier, the geography datatype has some obvious and not-so-obvious idiosyncrasies. It is much more particular about the quality of source data. The orientation of the polygon ring (the direction and order in which the points are enumerated) must be correct for example. Another issue stems from trying to take a shape described in a planar universe and wrapping it around a sphere. If it’s a complex shape there’s a good chance that parts of the shape will come to overlap one another, and overlaps are not allowed in the geography datatype.

Let’s try converting all of our Illinois places to geography objects:

SELECT top 100 *, geography::STGeomFromText(geom.STAsText(), 4326) as geog from dbo.tl_2009_17_place

Which will throw the following error:

image

Jay over at beginningspatial.com has posted a fantastic set of tips and tricks (including Ed Katibah's fix for incorrect ring orientation) that can be used to make geometry data valid so that it can be converted successfully to the geography type, but rather than rehashing those here (do check out the article linked above, it contains invaluable advice for the novice geospatial developer) I want to talk about another method. Enter the SQL Server Spatial Tools, an open source Codeplex project that can be installed on your SQL Server and provides some very powerful functions to help you manipulate your geospatial data.

Download and install the tools to the geospatial database we are using according to the Readme.txt file included in the download. You’ll need to run against the target database as the tools are registered on a per-database level, and you’ll have to modify the install script to point to the DLL included in the download. Once the tools are installed, let’s see what they can do for us.

First let’s use one of our new functions to determine which of our shapes is a valid geography instance using IsValidGeographyFromText(). Note that there is no native IsValid() or MakeValid() function for the geography type like we have for the geometry type, making the Codeplex SQL Server Spatial Tools an invaluable addition to your toolkit as a geospatial developer. Run the following query:

select dbo.IsValidGeographyFromText(geom.STAsText(),4326) as IsValidGeog, *
from tl_2009_17_place as places
where places.NAME like '%Chicago%'

Which should return this result set showing that only half of our geometry instances can be successfully converted to a geography instance out of the box:

image

Now the tools from Codeplex come to the rescue. One of the new functions is called simply MakeValidGeographyFromText() and does it exactly what it says, applying a number of techniques to the geometry data you feed it to massage it into valid geography data. Let’s run the following query to get the geography objects for our Chicago-area towns:

select dbo.MakeValidGeographyFromText(geom.STAsText(),4326) as ValidGeog, *
from tl_2009_17_place as places
where places.NAME like '%Chicago%'

If you select the “Spatial Results” tab in SQL Server Management Studio you’ll see a representation of our fine city as well as some outlying communities with similar names:

image

Now let’s get down to business. If you’ve been paying attention you’ll notice that we’ve been using SRID 4326 for this example, which corresponds to WGS84. Each spatial reference system uses its own units of measurement, WGS84 uses meters. So, if we want to know the area of a shape in square miles we will have to convert square meters to square miles. There are 1609.344 meters in a mile, so we need to divide by 1609.344^2 = 2589988.11, and then divide the population by this number to get population density:

select places.NAME,places.geom,pops.POP_2008 ,pops.POP_2008 / (dbo.MakeValidGeographyFromText(geom.STAsText(),4326).STArea() /  2589988.11) as PopDensity
from tl_2009_17_place as places
inner join ILPlacesCensusPops as pops
on pops.place = places.PLACEFP
where places.NAME like '%Chicago%'
and pops.STATENAME = 'Illinois'

Which yields the following results:

image

As you can see, for the city of Chicago we’ve determined an approximate population density of 12182 / sq mi. A quick search on Wikipedia tells us that the population density for Chicago is 12649 / sq mi:

image

Close enough for jazz, or in this case, the blues.

In the next post I will talk about some more neat stuff we can do with our sample data…