« The Death of The Book Has Been Greatly Exaggerated | Home | The Valuable Minutiae of Local Blogging »
People are Smarter Than Computers (Part 3)
By Cameron Ferroni | December 17, 2007
And now, the conclusion (well, at least for 2007 - I’m pretty sure that at some point over the next three weeks, something will happen that will cause me to utter the words “Stupid machine” at least once, and maybe I’ll have more fodder for this column.) But for now….
It’s actually too bad that I had this title theme going, since I had some really good options for this column - for example “The Fallacy of Normal” or “Everything I Needed to Know About Punctuation I Learned in Grade School (so why can’t my computer?).” In any case in this installment, we are going to look at punctuation marks- or more specifically “special characters” - and understand why it is so damned hard for computers to deal with them.
Over the past few weeks I have searched for the following things: “Christmas Trees At the Hop-In Market”, “How to cut a 6′ board on the table saw”, “T & J Kitchen Supply Seattle”. Now, you will notice that in each of these searches, the special characters are pretty important - in the first and third examples, they are a critical part of the name of the business, and help to eliminate a lot of unwanted results. In the second example, the whole point was I was looking for ways to safely cut a six-foot board, not a six-inch board. For those of you who understand woodworking, these are very, very different operations.
However, as you can expect, the engines pretty much mangle any query attempted with these types of characters: Often times this happens without any obvious pattern. Sometimes they are treated like special characters indicating search pattern matching (the ‘&’ for example gets treated as an AND operator), sometimes they are ignored, sometimes they are stripped to the point that T&J turns into TJ. Even more interesting is that sometimes the search engines break the words apart - I tried hopin market, and it gave me results with hop market, hop in to the market etc. In this case, the computer is trying to pretend that it is smarter than me and knows what I actually meant, rather than respecting what I meant in the first place.
Even when I try to use quotations to force it to do my bidding it doesn’t work. Nor does it work if I try to “escape out” the characters etc. At the end of the day, it just seems like the engines frankly can’t tell the difference between a 6″ piece of wood, a 6′ piece of wood, or six pieces of wood. And that my friends, is why, once again, people are smarter than computers. We learned at a young age that punctuation matters - we know that it’s sometimes hard to see what its point is - but we see the difference. And computers don’t.
Why? Well, it all comes down (again) to normalization (who really wants to be normal anyway?). When documents get parsed by the search engines and put into their massive indices, they have to normalize the data. They take all of the punctuation out, and remove all of the basic words - words like and, of, the, a, an - anything that really doesn’t contribute to the contextual relevancy of the page. And unfortunately, punctuation marks fall into that category. As a result, although the page may very clearly say ‘T & J Kitchen Supply,’ - the index has no idea that that is what it said originally, it just knows that it had a TJ right next to each other.
Now, I’ll be the first to admit that these aren’t easy problems to solve. Search is very very hard. Clearly it must be, since the number of true innovations in search has been dwindling rapidly for the last few years. But that doesn’t mean we shouldn’t be trying - we owe it to ourselves, and our users to make computers at least a little smarter….
Topics: Data, Local Search |

