There are more questions than answers? Debatable!
Johnny Nash wrote a song ‘There are more questions than answers’ but, while I love Johnny, spending the last 10 years thinking about AI and Search makes me think it’s just not so.
After college I wanted to ‘do AI’ and went to Texas to work on the CYC project. During my time there the internet took off - nearly overnight. This explosion created more information than a handful of people could encode and translate into a rule-based predicate calculus. My belief in a hybrid approach of statistical and rule-based modeling wasn’t shared at the time and so I moved on and spent time writing Linux device drivers and middle-ware in the telecom sector.
But that didn’t stop me from continuing to think about AI and search. Eventually I settled into working on more semantic web applications (which is a whole ‘nother post) but was wooed away to try a different approach.
During the course of these events I became a proponent of Brute Force AI — doing simplistic things that leverage massive amounts of power (like modern processors, clusters and networks) or people (like Wikipedia and the web).
I saw how people actually ask questions and perform search. This has made me believe that there simply aren’t that many questions to be answered. Old-school AI, linguistics and philosophy may be able to come up with numerous ways of phrasing questions and speak of the infinite combinations of language - but in my experience there’s not that many ways that real people ask real questions.
A while back I had this idea of how to approach the questions and answers field - a 90% solution. I sat on the idea too long and now major parts of it have surfaced, scattered across the web.
My inspiration came from a page at Wikipedia - Why is the sky blue?. This is a redirect page, which takes a page title and sends it to another page - Diffuse sky radiation.
Why not have all questions redirect to an answer?
A few months ago, someone beat me to an implementation over at AnswerWiki. It’s not quite what I’d have done - but they have the right domain name and they’re first-to-market. Let me describe what I had planned out in hopes that one of these answer sites1 will implement my ideas.
Many of the general, reference questions that people ask can be answered in a few sentences or less. This is the idea behind Wikipedia’s lead section.
The lead section is the section before the first headline. It is shown above the table of contents (for pages with more than three headlines). It should establish significances, large implications and why we should care.The lead section contains the short answer to questions that can be reduced to the form of “Tell me about X”.
More specific questions can be answered through basic templating. “When was X born” is a template already being used by Google (try when was Aristotle born).2 Wikipedia has a user-generated category system which would allow you to write templates that could produce answers for some the most popular categories:
- Towers by Country could be used for “How tall is X”.
- Category:People for person related questions like “[Where | When] was X born?”
- Category:Landforms for geographical questions like “How [high | long] is X?”
- even Category:Universities for questions like “What is the motto of X?”
It is my belief that relatively few of these template solutions could pre-populate an answer site (wiki or otherwise) with the vast majority of questions that people actually ask. For example the following questions would all redirect to “Tell me about James Dean”.3
Tell me about James Dean James Byron Dean (February 8, 1931 – September 30, 1955) was an American film actor who epitomized youthful angst. Dean’s mainstream status as a cultural icon is best embodied in the title of his most cited role in Rebel Without a Cause. As with Buddy Holly, Bruce Lee, and Marilyn Monroe his death at a young age helped guarantee a legendary status. Who is James Dean? When did James Dean die? James Dean died when? When was James Dean born?
This would only answer the factual set of questions, but the same techniques could be used to mine the more subjective questions on answer sites, expert sites and even blogs.
While most questions that are actually asked can be answered this way, some reference questions don’t have good answers available via template search solutions. My favorite one to ask, being a history buff, is “How long did it take to sail from England to India?”. The answer is in the wiki page on India Pale Ale with a more detailed answer in Passage East.4
This alternative question and answer population could occur organically, like Wikipedia and the answer sites, or it could occur in a more structured way - by using Amazon’s mechanical turk, or something analogous,5 for as yet unanswered questions.
Finally, certain questions that aren’t reference based also have great templates - see Google’s SMS Demo for examples of mapping, yellow pages and currency conversions.6 7
Now there may be no answer to some of the philosophical questions that Johnny Nash raises;
- Why is there so little love among men?8
- What is life?
- How do we live?
- What should we take and how much should we give?
Yet a few clever templates, combined with leveraging the knowledge of the massses, could answer the majority of the questions people expect to be able to find answers for.
Notes
- Answer sites like Google Answers, Yahoo! Answers, MSN QnA or About.
- Google mines both wikipedia and who2.
- They also distinguish between when and where.
- It appears they have some threshold before the template triggers since When was Jason Arnott born doesn’t use the template but the same query with Wayne Gretzky does.
- Notice that the summary on Wikipedia doesn’t have how James Dean died. That answer would be edited in my idealized version.
- India Pale Ale gets it’s hoppy flavor because hops were a way of preserving the beer over the six month voyage from England to India — which became two months with the opening of the Suez Canal and eventually got down to three weeks.
- Cellphedia, AskMeNow and Mozes likely use this human-swarm plus a cache of questions and answers.
- Some of this reference templating is available via Google Define which is visible through Google SMS. I haven’t found it as useful - there’s some disconnect between the Define group and the web search templating group. Example: define: wayne gretzky gives me a hold-em poker hand — (with no option of more results on SMS) — whereas define: michael jordan gives me an expected reference-based answer.
- I think mobile devices are the preferred showcase for these short answers to questions. Standard web search is perfect for research, where a page of many results helps to alleviate questions of authority and its feasible to return 10 or more results. Mobile search has limited screen real-estate and, in my experience, is used for more recall-based questions - where you know something vaguely but require short specifics or there’s a strict algorithm for correctness (like currency exchange, weather and the yellow pages).
- The lyric sites online list this as “Why is there so little of a moment”. That’s not what Johnny says - give it a listen.
