« The Omniscient Jerk | Home | Marchex and Idearc Media Announce Local Advertising Agreement »

The Comprehensive Myth

By Cameron Ferroni | January 29, 2008

One of the things that is truly amazing about the Internet is the wealth of information available through one simple interface. Your Internet Browser coupled with a powerful Search engine can bring you content in many different forms, from millions of different sources. This amazing power has brought along with it very challenging user expectations - namely, if all of that information is out there, that I can clearly see, why isn’t there a better way to bring all of it together in a rational way?

Well, because. Because it is hard. Really hard. At Marchex we’ve been working for a couple of years, and OpenList for years before that, to solve the problem just for local business information. While we’ve made great strides, we still have a long way to go to achieve the myth of a comprehensive and complete dataset. Why? Because it is hard.

Seriously though, to achieve completeness you have to execute flawlessly against 2 separate vectors. The first is to have all of the entities. In our case that means businesses. And that means for us we have to keep track of over 15 million local businesses in the United States alone. The second is to compile and aggregate a comprehensive set of information about those businesses - and that is more than exponentially difficult. Some of it is pretty straightforward. There are sites out there that have pretty complete information about some categories of business - finding every address of every restaurant in the United States, as an example, is a reasonably solvable problem, there are some high profile sites that get this 95% right, so it’s not a bad way to go. Finding out which outdoor patios accept pets however, is far less straightforward. In general this information is available, for a decent % of businesses. However that data doesn’t exist in a single format, in a single location. There’s no www.patiosforpets.com that lists them or anything (no seriously there isn’t - I just checked). But in some pet friendly cities, it’s likely that someone has posted the information for that particular city. OK, turns out I was actually a little misguided here - while there is no www.patiosforpets.com, there are any number of sites that do attempt to quantify this information - for those of you that care - (www.dogfriendly.com, www.petfriendlytravel.com, www.tripswithpets.com) just to name a few. But my point is still valid - ideally you don’t want to have to go to a bunch of different sites to get all of that information. But getting it all aggregated in one place is hard, even for computers. In this case Open List would have to figure out how to go and get information from all 3 of the above sites, and then figure out how to bring it all together, resolve issues where one site says that a place is dog friendly and another disagrees etc etc etc. And that is just to address dog friendliness and restaurants. When you think about all of the interesting data, across all businesses, you can quickly see why this is an exponential challenge.

So what is my point? First off, it’s no wonder that so many of the local information sites that have been cropping up lately focus on particular niches - like only focusing on restaurants that are dog friendly, or doing a deep dive on florists in select cities - it’s just so overwhelming to try to do it all for everyone (but we’ll keep trying!). Second, it just reinforces again the importance of having users contributing content to the sites. It’s far easier to let users add some of this data sporadically, than trying to be perfect algorithmically. Going further, it would be even better if users could just automatically add pointers to sites with this kind of information, and maybe even control the engine so that it could crawl the sites itself, just based on the user assist!

But just because it’s hard doesn’t mean we are going to stop trying. We continue to improve our engine, and add more sources of information, and display more information about more businesses. So, until we build a really cool user tool to do it, if you have sites that you think are really valuable, send them our way and we’ll do our best to get the data incorporated, one source at a time.

Topics: Local Advertisers |

Comments