New Search, Semantic Web Coming?
Posted on March 12, 2007
Filed Under Web News |
At first blush the concept of the Wikipedia model adapting itself to search technology and posing a threat to Google seems improbable. But Jimmy Wales says that search is broken, and he can fix it:
“Search is part of the fundamental infrastructure of the Internet. And, it is currently broken.
“Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency. Here, we will change all that.”
Wales, founder of Wikipedia, is the leading light of a new project to build the search engine that “changes everything”. The Wikipedia concept is essentially one of using humans in collaboration to annotate and classify knowledge. Similarly, the Wikia project will use humans to “tag” knowledge according to collaboratively agreed descriptors - meta tags in fact - and essentially laying a meta language over the whole Web.
Wales said on Thursday, March 8th, in Tokyo that Wikia, the for-profit arm of his Wikipedia enterprise, would build a search engine to rival those of Google and Yahoo.
“‘The idea that Google has some edge because they’ve got super-duper rocket scientists may be a little antiquated now,’ he said.
“Describing the two Internet firms as ‘black boxes’ that won’t disclose how they rank search results, Wales said collaborative search technology could transform the power structure of the Internet.
“While Wales declined to give any earnings targets, he said the company had received a $4 million investment from ‘angel investors’ as well as a ‘very large investment’ from Amazon Inc.” From Reuters
Before anyone goes to this much trouble, is search really broken? Tom Foremski, a technolgy and business journalist at Silicon Valley Watcher, argues that it really is, in terms of who’s doing the grunt work.
In his 2-part article Is Search Broken?, he describes how humans, not robots, have been doing much of the heavy lifting for some time now to improve the quality of search. And this annoys him.
“Why should I have to tag my content, and tag it according to the specific formats that Technorati, and other search engines recommend? Aren’t they supposed to do that?”
Foremski adds that bloggers, with their specific talent for clustering relevant links around a subject, are a clear example of humans doing the work the engines should be doing. He points to all the content increasingly optimized for search engines rather than to please human readers. Webmasters are encouraged to upload their sites to Google, create XML sitemaps, and qualify content with no-follow and no-index tags.
“I resent the fact that I have to create all this content describing my content–the search engines should be creating this ‘metadata.’
“I just want to write stuff, and leave it up to the search engines to find it, classify it, index it”
In Part 2, Foremski sums it up nicely:
“there are ‘people-powered’ search efforts all across the globe, involving tens of millions of people, laboring every day to help improve the search experience. This is done by adding tags, site maps, headlines, etc, — they are creating ever larger amounts of valuable search metadata about content.”
Meanwhile in the world of Customer Relationship Management (CRM) software, Shai Agassi, president of the Product and Technology Group at industry giant SAP, describes in an interview how he thinks a real search engine should operate.
“If you went out and typed “employees of Shai Agassi” into Google, whatever they searched has to have the word “employee” and the word “Shai” and the word “Agassi” in that line or in that document. They don’t have any understanding of the semantics of the business architecture.
“What you really wanted to get are all the employees reporting to me in the human resources system and, for each one of them, you want to get all the documents related to employment. It should know that Shai Agassi must be a manager. And I need to know how to codify between employees now working for Shai versus those that used to work for Shai. It’s very complicated, OK? It’s not like a search you do in Google.” From SAP President Agassi on Search, Strategizing and Web 2.0
Agassi’s advantage, and his environment, is the world of business process, so he has a finite amount of meta considerations to program into his search filters. Could you apply such intuitive leaps to the whole Web? Jimmy Wales says he intends to. And there are other developments arising, notably the Freebase project, which we’ll describe in a following post.
The so-called “Semantic Web” - wherein all data is labeled for automatic processing with Meta data - has been growing since the beginning of the Web, and before. The hallmark of the current “Web 2.0″, in which we operate today, is the social networking and collaboration technologies and communities that have stamped such a large new footprint on the old Web.
These innovations demonstrate the massive, tidal power of human endeavor, when we’re given useful tools to act in the aggregate. And the tools are going to get even better.
We are in a time on the Web now when the realm of practical possibility is larger than it has ever seemed before. Jimmy Wales’s PR speeches are geared to catch attention and raise interest and venture capital, but the discussion is best served by taking it out of the context of competing against Google. There’s probably room for all of Wales’s innovation without Google’s losing any share, on today’s technical state alone, given that the Web population is still growing.
Search engines have done a remarkable job of finding our data thus far, but now that we have the inspiration of Web 2.0, it’s useful to see that job in its correct perspective, as a small part of a much greater evolution.
When we complain that search engines are inadequate, and that we’ve had to learn to talk their pidgin-speak just to get them to find us, all of this is true. It serves as a perfect deliberate challenge to think outside of the current box, and to enlarge the project scope of the Web, and its infrastructure. And Google will be deeply involved in this Web project for a time to come yet.
Maybe humans will do much of the heavy lifting in tagging the Semantic Web, and if so we’ll do it because it’ll serve us all to make sure our little patch of interest is tagged accurately. But software tools to clean data will help with this at every turn, and we shouldn’t rule out the possibility that the real heavy lifting can be done algorithmically, in combination with human-community peer review for quality assurance.
Tom Foremski’s insistence that the engines should be doing all this anyway is a worthy gripe - let’s keep those engines on their toes. After all it will be the software engines that will want to deliver all the bountiful products of our interactions, in ever new recombinations, every morning with the sun.
Comments
One Response to “New Search, Semantic Web Coming?”
Leave a Reply
[…] stories are the beginning edge of the Semantic Web, also thought of as 3.0. It refers to a playing field leveled by the overarching need for clean […]