NPR Will Shortz Puzzle Challenge For GIS

NPR Will Shortz Puzzle Challenge For GIS

Model Builder Find Adjacent States

What is the longest word that can be produced using the first letters of adjacent states?

One of my GIS Master’s classes was a programming course covering Model Builder, Python, and C# – only a little ambitous! One extra credit assignment was to use GIS and python to solve a puzzle from NPR’s Will Shortz, that asks, “What is the longest word that can be produced using the first letters of adjacent states?” I made the following assumptions, but someone might choose differently.

  • The four Corners states of CO, UT, AZ, NM were considered adjacent.
  • HI and AK were excluded, but the District of Columbia included
  • A state could be “travelled” to more than once in a trip as long as it produced a string that started a valid word – for example MI, OH, MI for “mom”
  • Only the first letter of the state was used in the word string. The original challenge was vague about how many letters can be used from a state name.

The following data was used to complete this puzzle:

Determining Adjacent States

The first part of my solution was a Model Builder script that produced a geodatabase table of states and their adjacent states. The model builder script, GetAdjacentStates:

  • Created a buffer around each state
  • Created a union of the buffer to itself to get adjacent states
  • Selected rows in the union result where the state != state
  • Added a field that combines the name of the state and the adjacent state
  • Created a summary of the new field to dedupe the rows, also meaning that the table is sorted by state
  • Removed extra fields in the table

Here is a screen shot of the model builder file:

Model Builder Find Adjacent States

Model Builder Find Adjacent States

Traveling State to State Looking for Words

The second part was a python script, CalcWords.py that:

  • Read the geodatabase table of states and their adjacent states and placed them into a dictionary of state objects.
  • Determined which letters of the alphabet do not start state names. Words with letters not found in the beginning of state names can be excluded.
  • Read a file consisting of ~350,000 legitimate words and assigned them to a list after excluding any words containing letters of the alphabet that don’t start a state name.
  • Passed the list of words to a dawg object which allowed for fast retrieval and verification of exact words.
  • Went through the list of state objects and recursively traveled down the word strings created by combining the first letter of a state plus the first letter of an adjacent state plus the first letter of the adjacent state’s adjacent state, etc. Each successive string combination was checked to see if it was a real word against the dawg, and if so, was added to a states’ real word list. A string combination  was checked to see if it is the beginning of any word – if so, recursion begins for the state’s adjacent object.
  • The resulting list of words created by moving from state to adjacent state was then deduped to contain only unique words.

And the Winner is …

The deduped list was queried to find the longest word. The longest result was the nine letter word nomocanon. The following 8 letter words were also found: atomisms, incanton, kiwikiwi, minimism, matamata, mocomoco, monoamin, nonunion, tokonoma, and wikiwiki. Apparently Will Shortz’s solution used multiple letters of state names and was the eight letter millions. Assumptions do affect the results.

The code could certainly be improved as there was a lot of redundant checking. At the time this was written, a method to check the existence of child nodes in a dawg structure didn’t work, which would have saved processing time. Some hashing of successful letter combos could also have improved the brute force method.

GetStates.py Script

# ---------------------------------------------------------------------------
# GetStates.py
# Created on: 2012-03-16 18:00:26.00000
#   (generated by ArcGIS/ModelBuilder)
# Description:
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy

# Local variables:
states = “\\\\psf\\Home\\Documents\\USC\\586_GIS_Programming\\Week6_7\\states.shp”
states_buffer = “\\\\psf\\Home\\Documents\\USC\\586_GIS_Programming\\Week6_7\\states_buffer.shp”
states_buffer_Union = “\\\\psf\\Home\\Documents\\ArcGIS\\Default.gdb\\states_buffer_Union1”
states_select_table = “\\\\psf\\Home\\Documents\\ArcGIS\\Default.gdb\\states_select_table”
add_field = “\\\\psf\\Home\\Documents\\ArcGIS\\Default.gdb\\states_select_table”
pop_field = “\\\\psf\\Home\\Documents\\ArcGIS\\Default.gdb\\states_select_table”
sum_table = “\\\\psf\\Home\\Documents\\ArcGIS\\Default.gdb\\adjacent_states”

# Process: Buffer
arcpy.Buffer_analysis(states, states_buffer, “1 Meters”, “OUTSIDE_ONLY”, “ROUND”, “NONE”, “”)

# Process: Union
arcpy.Union_analysis(“\\\\psf\\Home\\Documents\\USC\\586_GIS_Programming\\Week6_7\\states_buffer.shp #;\\\\psf\\Home\\Documents\\USC\\586_GIS_Programming\\Week6_7\\states_buffer.shp #”, states_buffer_Union, “ALL”, “”, “GAPS”)

# Process: Table Select
arcpy.TableSelect_analysis(states_buffer_Union, states_select_table, “\”STATE_NAME\” <> \”STATE_NAME_1\””)

# Process: Add Field
arcpy.AddField_management(states_select_table, “S1S2”, “TEXT”, “”, “”, “”, “”, “NULLABLE”, “NON_REQUIRED”, “”)

# Process: Calculate Field
arcpy.CalculateField_management(add_field, “S1S2”, “[STATE_NAME] + [STATE_NAME_1]”, “VB”, “”)

# Process: Summary Statistics
arcpy.Statistics_analysis(pop_field, sum_table, “FID_states_buffer FIRST;STATE_NAME FIRST;FID_states_buffer_1 FIRST;STATE_NAME_1 FIRST”, “S1S2”)

# Process: Delete Field

arcpy.DeleteField_management(sum_table, “S1S2;FREQUENCY”)

CalcWords.py Script


# ---------------------------------------------------------------------------
# CalcWords.py
#
# Description:
#By Nancy Milholland
# ---------------------------------------------------------------------------

# Import arcpy for arcgis, xlrd for excel, and WordUtils for dawg modules
import arcpy
import xlrd
from WordUtils import dawg
# Create a State object class with three methods
# State class inherits the list class
# This makes it easier to deal with the state’s adjacent state list
# Pass the initialize state object the state name (helps with debugging)
# the first letter of the state and a list of adjacent states
class State(list):
def __init__(self, name, first, adjlist=[]):
list.__init__([])
self.Id = id
self.Name = name
self.First = first
self.extend(adjlist)

# Print the state and adjacent states for debugging help
def PrintState(self) :
print self.Name,”First Letter “, self.First,”- Adjacent States”
for a in self :
print “\t”,a

# The recursive method – takes everything! biglist of words, dawglist
# the list of real words formed by transversing the states, the current snippet
# string formed by travelling from state to state, and the statlist of state
# objects
def MakeWords(self, wordlist, dawglist, goodwords,snippet,statelist) :
#Go through each adjacent state, and add the first letter to the snippet
for a in self :
recurse = False
aobj = statelist[a]
snip = snippet + aobj.First
#If the snip is a real word according to the dawglist, add it to the list of goodwords
if dawglist.patternSearch(snip) :
goodwords.append(snip)
print snip,”is a good word”
# Now see if the snip is the start of any words, if it does, stop searching and
# start recursing down the adjacent state’s adjacent states. When nothing else #starts with
# the string, the program will naturally return back to the calling method
for w in wordlist :
if w.startswith(snip):
recurse = True
break
if recurse :
aobj.MakeWords(wordlist, dawglist, goodwords,snip, statelist)
# Local variables:
# Geodatabase table of states and their adjacent states. Doesn’t include HI or AK
# but does include District of Columbia. The 4 corners – CO, UT, AZ, NM are considered
# adjacent
sum_table = “\\\\psf\\Home\\Documents\\ArcGIS\\Default.gdb\\adjacent_states”
# Large word file of ~350,000 legit words
wordfile = “C:\\Python26\\word_list.xls”
# Create a search cursor
#
rows = arcpy.SearchCursor(sum_table)

# Create a list of string fields (fields[0] is already sorted)
fields = arcpy.ListFields(sum_table, “”, “String”)
state = fields[0]
adjstate = fields[1]
lastStateName = “”
StateList = {}
alphalist = {}
# Create a alphabet dictionary, setting all to False
for i in xrange(ord(‘a’), ord(‘z’)+1):
alphalist[chr(i)]=0
# Read the excel file and create an object for each state
# and a list of the state objects
for row in rows:
#    print “%s: %s %s: %s”% (state.name, row.getValue(state.name), adjstate.name, row.getValue(adjstate.name) )
currstate = row.getValue(state.name)
if currstate != lastStateName :
adjlist = row.getValue(adjstate.name)
fchar = currstate[0].lower()
st = State(currstate, fchar)
st.extend([adjlist])
StateList[currstate] = st
lastStateName = currstate
# set the dictionary letter to true
alphalist[fchar] = 1

else :
st.extend([row.getValue(adjstate.name)])
# Now go through the dictionary of letters and create a regular expression
# of all the letters that did not start a state name
ex_list = []
for i in xrange(ord(‘a’), ord(‘z’)+1):
c = chr(i)

if not alphalist[c] :
ex_list.append(c)
exclude_string = ‘[‘+”.join(ex_list)+’]’
print exclude_string

#print “checking statelist”
#for s in sorted(StateList):
#    sobj = StateList[s]
#   sobj.PrintState()
#    print sobj.Name, sobj.First
#    for att in sobj:
#        aobj = StateList[att]
#        print “\t”, att, ” lookup next level:”,aobj.First

# Getting the Excel list of words into a big list
big_list = []

book = xlrd.open_workbook(wordfile)
for si in range(book.nsheets):
sheet = book.sheet_by_index(si)
#    print sheet.name, sheet.nrows, sheet.ncols
for rindex in range(sheet.nrows) :
currword = sheet.cell(rindex,0).value
# exclude any words that are found in the regular expression – letters that don’t
# start any state names so that the list is smaller
if not re.search(exclude_string,currword):
big_list.append(currword)

print len(big_list)
#Create a dawg object to lookup real words and feed it the list
worddawg = dawg.Dawg(0)
worddawg.loadList(big_list)
# Now go through the list of states and start the recursion
realwords=[]
for s in sorted(StateList) :
sobj = StateList[s]
snippet = sobj.First
sobj.MakeWords(big_list, worddawg, realwords,snippet, StateList)
#Eliminate dupes from the realword list
realwords = list(set(realwords))
print realwords
#Get the longest word
print max(realwords, key=len)

Comments are closed.