A friend (sfgeekmom) at http://www.sfkfiles.com recently asked me to produce a racial dot map for San Francisco kids ages 5-17 in the style of Racial Dot Map created by Dustin Cable of the Weldon Cooper Center for Public service. The map also needed to include the San Francisco school district attendance areas.
Data came from three sources:
- www.data.sfgov.org provided shapefiles of the City of San Francisco, parks, and the bay.
- www.census.gov provided the census tract shapefile and data on kids ages 5-17 by race.
- http://www.arcgis.com/home/item.html?id=5a123a1cd7fc413fa7e103763c6f2ff9. provided the map package of school district attendance areas.
There were two issues to tackle in creating the map. A look at the census tracts shows that they don’t fit neatly into San Francisco proper, but extend into the bay, ocean, Alameda, and Marin. The census tracts also include parks and golf courses – not places where kids actually live. The challenge was to adjust the census tracts to conform to areas where children were likely to actually live (not in the middle of the bay) and create a denser dot distribution.
The second challenge was to randomly distribute and symbolize the children by race within each adjusted census tract shape.
Adjusting the Census Tracts
I used ArcMap to create the map and do the processing, but other mapping software or scripting tools would work. The clip and erase tools did the bulk of the work, and then I did some editing by hand to adjust certain areas.
The erase tool preserves features of one layer outside the polygon boundaries of another layer. In this instance, the park layer “erased” or readjusted the census tracts shapes so they excluded San Francisco parks.
Constraining the Tracts to San Francisco Land and Yerba Buena/ Treasure Island and Other Cleanup
The clip command is the opposite of erase and preserves data found within the boundary of another. Initially I clipped the census tracts with San Francisco Bay shapefile, but ended up with the result below. Alcatraz remains in the census tract shapes, as do segments of Marin, Alameda, and a northern segment of bay that I originally did not spot (not pictured) as it was outside of the map extent I was viewing while working.
The outline of San Francisco plus the erase command removed Alcatraz, Marin, Alameda, and the bay from the census tract shape definition. The San Francisco Park layer had a boundary that extended into the bay. For visual clarity in the final map, the parks were erased by the San Francisco outline layer to keep the boundaries out of the water.
There were a couple of further edits by hand to the census tracts. A look at the Southwest corner of the city shows that most of the census tract consists of golf courses, Fort Funston, and Lake Merced. After zooming in and using Google street view to inspect the area, the census tract was redrawn to a tiny residential area.
One census tract including Twin Peaks was also edited to exclude open space. The edited census tract layer is shown below.
Certainly additional revisions could be made to the census tracts to remove parts of the Presidio, major streets, highways, as well as commercial or industrial zones to provide a better distribution across residential areas, but this seemed like a reasonable stopping point due to available data and time.
Distributing Children Randomly by Census Tract and Race
The census data provided these racial classifications for children: Asian, Black, Hispanic, Native Hawaiian or Other Pacific Islander, Other, 2 or More Races, and White. For each census tract, the total number of children by race needed to be represented as randomly geographicaly distributed points within the census tract. Each dot needed to record which race it represented so that the dots could be symbolized based on race. So if census tract A had 549 Asian children, 549 points needed to be generated randomly across census tract A and recored as Asian. Since there were seven different races, that required the same processing steps, only with slightly different inputs and outputs, I created a model in Model Builder to easily replicate the process across the 7 racial categories and save some typing. This also made it easy to rerun the model if I needed to adjust the census tract boundaries (which did happen to remove bits of tracts in the bay and adjust the Lake Merced tract to exclude golf courses and Fort Funston).
An excerpt of the model is shown below. For each race category, random points are generated equal to the number of kids in the census tract and output to a new shape file. A new field for racial type is created and populated using Calculate Field. Seven new shape files are created by the model. The Append tool combines the seven files into one and this is used to symbolize each race following a color scheme similar to the Racial Dot Map.
Visualizing the Children by Race
The Append tool combined the seven files into one new file used to symbolize each race following a color scheme similar to the Racial Dot Map. Displaying each child as a tiny dot meant that the legend for the layer was useless. I created a duplicate layer with larger symbology for the points. The layer is turned off for display but used in the generation of the legend.
The Final Map of San Francisco Children Ages 5-17 by Race
The final map shows an interesting distribution of children across San Francisco. Asians are heavily represented in Chinatown, The Richmond, and the Sunset. Whites are prevalent in the central North/South section of the City. Several segments of the Southwestern section of the city such as Hunter’s Point have almost no children as they are primarily industrial. Black children are concentrated in the Western Addition, Portrero Hill, and Bayview. Hispanics are concentrated in the Mission. The southern section of the city is a mix of Asian, Black, Hispanic, and 2+ races. The green lines in the map represent the school district attendance areas. When SFGeekMom posts her blog entry, I’ll include the link.