June 6th, 2012

Over the weekend, we stumbled on a paper called Towards Building Robust Natural Language Interfaces to Databases, which looks like it was published back in 2007 or 2008.

In section 6, the authors say "The PRECISE group reported a side experiment in which a student took over 15 hours to build a Geoquery 250 NLI using Microsoft’s EnglishQuery tool. The resulting system achieved rather poor results for such an expensive effort – approximately 80% precision and 55% recall, yielding a correctness of approximately 45%. Our limited experience with the Microsoft English query tool was rather frustrating as well, but it would be interesting to repeat this experiment and also see how other commercial systems such as Progress Software’s EasyAsk and Elfsoft’s ELF fare."

Here's the web document that challenged us to create an interface for GeoQuery. And here's the listing of 250 questions used by the authors.

So on Monday we built a complete interface. Tuesday we checked it over and polished a few rough edges.

The interface would have 100% perfect accuracy, except that their data is inconsistent, so it's impossible to really answer #163, what is the population of the major cities in wisconsin? For example, if we asked this question about Hawaii, would the answer be the 4 cities listed as City1, City2, City3 and City4? Or would it be only the 3 cities listed in the City table for Hawaii?

Besides this (which could be fixed in minutes, if we got an answer one way or the other), constructing the interface was pretty straightforward. The first step was to import their tables into Microsoft Access, and because some of their relationships were expressed as lists, we wrote a few queries to render them as recordsets. For example, the Border table looks like this:
state state_abbreviation states_that_border_it
alabama al 'tennessee','georgia','florida','mississippi'

so we wrote a query called [Names of Bordering States], defined so:

SELECT Border.state, State.state_name AS BorderState FROM State INNER JOIN Border ON InStr(Border.states_that_border_it,"'" & State.state_name & "'")>0;

We did the same with rivers and also added a query, Capitals, to list info of just the capital cities:

SELECT City.state, City.state_abbreviation, City.city_name AS capital_name, City.[population/city] AS [population/capital], [State] & [city_name] AS [Key]
FROM State INNER JOIN City ON (State.capital = City.city_name) AND (State.state_name = City.state);

Then we created a relationship map for the database. All pretty straighforward Access-type setup tasks.





Then we ran the interface generator in automatic mode, and found that pretty much every style of question was handled.

What exactly does that mean? It doesn't mean that every question was answered. It means that, considering what information the question was expected to retrieve, we could quickly formulate a question to get just that information.

For example, take the question "what is the highest point in the US?"

Because the HighLow table has a field called highest_point, a naive answer to this question might list all the highest points in the US, just as if we'd asked "what are the highest points in the US?" This isn't what we want at all. So we need to add a rule in the "When I Say, What I Mean" section:

what is the highest point in the us ==> show highest_point that has largest highest_elevation

This is the step at which people usually ask, "Shouldn't the system figure that out by itself? Isn't that the whole point of a natural language system?"

Maybe in the future. Today, the best NL system in the world (which is this one) lets you create a robust, easily customizable NL interface, if you're willing to address the unavoidable ambiguities in language, and tell the system just what it is you want to see. Once you've done that, you can simply type: "what is the highest point in the us" -- the system will first apply these substitution rules to get a more targeted query, ie. "show highest_point that has largest highest_elevation" -- and finally, it will process the query, navigating all the requirements of relational database systems such as SELECTs, JOINs etc. to produce:

SELECT DISTINCT HighLow.highest_point , HighLow.highest_elevation , HighLow.HighLowState FROM HighLow ;

SELECT DISTINCT max ( [elfQ1].highest_elevation ) AS Lim FROM [elfQ1] ;

SELECT DISTINCT [elfQ1].* FROM [elfQ1] INNER JOIN elfQ2 ON [elfQ1].highest_elevation >= elfQ2.Lim ;

mount mckinley 6194 alaska

OK, but how many of these substitutions did we have to write for these 250 questions? The answer: 130

This really isn't as much work as it seems. First of all, several of the rules are only required because the questions have errors, probably because they're designed for a system running Prolog, which handles apostrophes oddly. Normally you wouldn't expect errors like "whats the largest city?" -- but OK, if it happens, we have a simple rule:
whats the ==> what's the       // fix error in input

Also, many of the rules are generalizations, added to answer questions that don't even appear in the 250 questions. For example, since we needed a rule for "most populous state" (question 161), we added these six just in case:

least populous capital ==> capital_name with smallest population/capital
least populous city ==> city with smallest population/city
least populous state ==> state with smallest population
most populous capital ==> capital_name with largest population/capital
most populous city ==> city with largest population/city
most populous state ==> state with largest population

Here's the complete set of 130. As mentioned, this was created in a single day. That's because very rarely does adding a rule change the behavior of another rule. At worst, we might have to add a 1/2 or a/b to ensure that the rules are used in the right order. Seriously, do any of these rules look too fearsome? How about:

"population density" ==> (population / area)     FUNCTION

Do you think it could be any easier to define "population density" for your application?



OK, was that really all? Well, no. There was one little wrinkle. (There always is. The true test of a natural language system is whether it's got that last little tool you need to solve that last little problem. Our system has been in constant development since 1988, so yes, it probably does.)

The problem was to distinguish between questions like "which rivers run through new mexico?" and "which rivers run through states bordering new mexico?" There might be many ways to solve this issue, for example using substitutions like those above. But in the ELF system there's a very simple solution.

Each database can have as many different interfaces as you like; we call them Views. We can also write scripts to decide which View should be used, based on the user, the question, etc. So in this case, we can immediately see that if the word "border" doesn't appear in the question, we can't be interested in rivers flowing through border states. Ergo,

Step 1: remove the [Names of Bordering States] table from the Relationship map, and connect the State table directly to [States with Rivers] on the state_name field.

Step 2: rerun the analysis done for GeoQuery using this modified Relationship map (we called the View NoBorders).

Step 3: add a View selector script into the Settings window's script panel. Enter the function's name, eg. ViewPicker, in the Name panel.

function ViewPicker
   if (InStr(1,Question,"border",1)>0) or (InStr(1,Question,"surround",1)>0) then
     ViewPicker="GeoQuery"
   else
     ViewPicker="NoBorders"
   end if
end function

Step 4: enter ViewPicker in the slot for Question Script on the View tab of Settings, and check off the Question Script box to activate it.




250 sample questions for GeoQuery interface
1 which rivers run through states bordering new mexico?
2 what is the highest point in montana?
3 what is the most populated state bordering oklahoma?
4 through which states does the mississippi run?
5 what is the longest river?
6 how long is the mississippi?
7 which state has the smallest population density?
8 what is the area of wisconsin?
9 what is the lowest point of the state with the largest area?
10 what is the longest river in mississippi?
11 what states border montana?
12 what states border new jersey?
13 which state has the longest river?
14 name the rivers in arkansas
15 which states have points higher than the highest point in colorado?
16 how many people live in the capital of texas?
17 how long is the delaware river?
18 what is the smallest city in the usa?
19 what states border georgia?
20 what is the smallest state by area?
21 how long is the mississippi river?
22 what states border delaware?
23 what is the shortest river in the usa?
24 what states have cities named plano?
25 how many rivers does colorado have?
26 what is the biggest city in georgia?
27 what states border hawaii?
28 what is the capital of the state with the highest point?
29 what state has the highest population?
30 what is the capital of maine?
31 which state borders florida?
32 what state has highest elevation?
33 what rivers run through the states that border the state with the capital atlanta?
34 what is the biggest city in oregon?
35 what is the lowest point of the us?
36 which state borders hawaii?
37 what are the major cities in ohio?
38 what is the population of springfield missouri?
39 how many people live in california?
40 where is the highest point in montana?
41 what are the major cities in alaska?
42 what are the major cities in kansas?
43 which state has the highest point?
44 what states border florida?
45 what states does the ohio river go through?
46 what is the largest city in minnesota by population?
47 how many rivers are there in idaho?
48 how high is the highest point in montana?
49 what is the lowest point in california?
50 what is the capital of georgia?
51 how big is texas?
52 what is the highest point in nevada in meters?
53 how many people live in minneapolis minnesota?
54 what is the area of maine?
55 what is the lowest point in oregon?
56 what state has the city flint?
57 give me the largest state?
58 how many states does the colorado river run through?
59 what is the area of south carolina?
60 which state has the highest elevation?
61 how large is alaska?
62 how many citizens live in california?
63 what is the biggest city in wyoming?
64 which states border south dakota?
65 what state has the largest population density?
66 what is the population of utah?
67 how many people live in rhode island?
68 what is the population of new york city?
69 which states border texas?
70 what is the population of seattle washington?
71 what is the highest point in colorado?
72 how large is the largest city in alaska?
73 what is the longest river in the us?
74 how many states does the mississippi river run through?
75 what are the high points of states surrounding mississippi?
76 what is the highest point of the usa?
77 what is the largest river in washington state?
78 what is the population of illinois?
79 which state borders the most states?
80 which rivers flow through alaska?
81 what city has the most people?
82 which states does the mississippi run through?
83 what is the capital of washington?
84 what is the smallest city in the us?
85 what are the major cities in texas?
86 which state has the highest population density?
87 what state contains the highest point in the us?
88 what states does the delaware river run through?
89 which states capital city is the largest?
90 how many citizens in alabama?
91 what is the highest point in states bordering georgia?
92 what rivers are in utah?
93 what is the area of the largest state?
94 what are all the rivers in texas?
95 what is the population density of wyoming?
96 what is the capital of new jersey?
97 what is the lowest point in nebraska in meters?
98 what major rivers run through illinois?
99 what is the capital of new hampshire?
100 what is the lowest point in massachusetts?
101 what is the largest city in states that border california?
102 what states border indiana?
103 where is the lowest spot in iowa?
104 how many square kilometers in the us?
105 what is the highest point in rhode island?
106 what are the major cities in rhode island?
107 what states border arkansas?
108 where is the lowest point in the us?
109 rivers in new york?
110 what is the population density of maine?
111 what is the lowest point in the state of california?
112 what is the highest point in the us?
113 how long is the colorado river?
114 how long is the north platte river?
115 how large is texas?
116 which states border colorado?
117 what is the lowest point in louisiana?
118 what is the population of dallas?
119 what is the population of tempe arizona?
120 how many rivers in washington?
121 what is the shortest river in the us?
122 what are the major cities of texas?
123 how many people live in kalamazoo?
124 how many rivers does alaska have?
125 what rivers run through colorado?
126 what is the length of the colorado river?
127 what is the state with the lowest population?
128 what states border rhode island?
129 how many rivers are in colorado?
130 what is the total population of the states that border texas?
131 what is the length of the mississippi river?
132 what is the population of oregon?
133 how many cities are there in the us?
134 what is the area of alaska?
135 how many people live in spokane washington?
136 what is the combined population of all 50 states?
137 what state has the capital salem?
138 how high is the highest point in america?
139 what is the biggest city in the us?
140 what is the smallest city in alaska?
141 how long is the shortest river in the usa?
142 what states have cities named dallas?
143 what is the biggest river in illinois?
144 what is the capital of iowa?
145 what is the highest point in iowa?
146 what is the population density of texas?
147 what is the longest river in florida?
148 what is the population of hawaii?
149 what is the smallest city in washington?
150 what are the major cities in oklahoma?
151 what state is des moines located in?
152 what is the highest point in the country?
153 what state borders michigan?
154 what states border new hampshire?
155 what is the lowest point in the united states?
156 how long is the rio grande river?
157 what are the major rivers in ohio?
158 what is the capital of north dakota?
159 what is the largest city in rhode island?
160 what is the population of the capital of the smallest state?
161 what is the most populous state?
162 what is the largest city in wisconsin?
163 what is the population of the major cities in wisconsin?
164 give me the cities in virginia?
165 which states have cities named austin?
166 what state is columbus the capital of?
167 what is the city with the smallest population?
168 what states does the missouri run through?
169 what is the longest river in the united states?
170 how many cities are in montana?
171 what is the highest elevation in new mexico?
172 how long is the missouri river?
173 what capital is the largest in the us?
174 what is the population of south dakota?
175 how many people live in new york?
176 what is the population of san antonio?
177 what are the major cities in california?
178 what state has the greatest population density?
179 which river runs through the most states?
180 which states does the missouri river run through?
181 which state has the highest peak in the country?
182 what is the biggest city in arizona?
183 what is the lowest point in the state of texas?
184 which state is the city denver located in?
185 what is the lowest point in arkansas?
186 what is the biggest city in texas?
187 what is the biggest city in the usa?
188 which state has the largest city?
189 how many rivers are in new york?
190 what is the lowest point in texas?
191 which states border kentucky?
192 which state borders most states?
193 how many major cities are in florida?
194 what are the major cities in wyoming?
195 what is the highest point in the usa?
196 what is the population density of the smallest state?
197 name all the rivers in colorado?
198 what is the capital of vermont?
199 what is the population of tucson?
200 what is the highest mountain in the us?
201 what is the capital of utah?
202 how long is the ohio river?
203 what rivers do not run through tennessee?
204 what is the highest point in wyoming?
205 which states does the mississippi river run through?
206 what states capital is dover?
207 what is the population of arizona?
208 whats the largest city?
209 what is the biggest city in louisiana?
210 how many people live in austin?
211 what is the total area of the usa?
212 what is the highest point in kansas?
213 which states border new york?
214 what state has the highest elevation?
215 what is the highest point of the state with the largest area?
216 how many people live in washington?
217 how many people live in hawaii?
218 what rivers run through new york?
219 how many people live in riverside?
220 what is the population of texas?
221 which states border arizona?
222 what is the area of the smallest state?
223 which state border kentucky?
224 what states border kentucky?
225 what is the largest state capital in population?
226 what is the smallest state in the usa?
227 where is the highest point in hawaii?
228 what is the smallest city in hawaii?
229 what is the population of portland maine?
230 what are the populations of states through which the mississippi river runs?
231 what is the shortest river?
232 what is the population of idaho?
233 what is the population of erie pennsylvania?
234 how many major rivers cross ohio?
235 what is the population of montana?
236 which state is kalamazoo in?
237 what are the rivers in alaska?
238 which state is the smallest?
239 what states surround kentucky?
240 which state has the greatest population?
241 what is the area of idaho?
242 what rivers run through west virginia?
243 what is the highest point in the state with the capital des moines?
244 what length is the mississippi?
245 what is the shortest river in iowa?
246 what states border ohio?
247 what is the combined area of all 50 states?
248 what is the longest river in texas?
249 what is the population of boston massachusetts?
250 what is the capital of the state with the largest population?



home