366 June 6, 2008 at 18:43

Semantic Search: Myths and Reality

Transfer

Semantic search has been spoken about for several years now. Any technology that can push Google from the top is of universal interest. Especially when it comes to the long-awaited and often discussed possibility of semantic search. However, we are not so much interested in progress in this area, how much we are saddened by the lack of real research results, because the search results are not so much different from the Google search results. What is the matter?

For example, when you enter “Capital of France” in the search bar, both methods give the same correct answer: "Paris." In addition, most of the queries that we drive into the search bar in the form of abbreviations give the same results if we introduce the term in its entirety. Obviously, something is wrong here. Everyone knows that semantic technology can do a lot, but why? And how do they work? After reading this article, you will find out that in fact, we simply ask the wrong questions.

The mistake is that semantic search engines, in fact, have an input line similar to Google, which allows us to enter queries in free form. Therefore, we enter the queries as we are used to - in the simplest form. We will never enter in the search bar “Which actor starred in the films Pulp Fiction” and “Saturday Night Fever”? or "Which two US senators took bribes from foreign companies?" We always drive simple phrases, but this is not the power of semantic search. To understand how everything works, we offer to consider several semantic search technologies from Google, SearchMonkey, Powerset and Freebase.

What problem are we trying to solve?

The first difficulty arises when the semantic search begins to be considered the solution to all kinds of problems - from the modern search system, where Google dominates, to tasks that cannot be solved by computation. It is even more complicated by the fact that at present there are only a few areas of knowledge where semantic search really does better - these are complex queries about conclusions and reasoning about complex data systems.

As can be seen from the above data, Google easily copes with the main types of queries. Unfortunately, automatic processing of a natural language provides only a slight advantage. Google will give the correct answer to the question about Leonardo’s birth year, without giving any chance to improve the search process by understanding the nouns and verbs that the user drives into the search bar.

Before considering tasks that semantic search can easily cope with, consider the most complex tasks. There are computationally challenging tasks that have nothing to do with understanding the semantics of a word. At the early stage of the Semantic Web, it was believed that with its help we would be able to solve even extremely complex problems, but unfortunately this is not so. There are limits to what we can calculate, and there is a class of problems with a huge number of possible solutions, and we cannot magically solve these problems only because we presented the information in RDF.

But there is also a layer of tasks that the semantic web does superbly. We solved them using a thematic database. But do not forget that semantic technologies help us find thematic information dispersed throughout the network - therefore, it is not surprising for us that semantic search engines will surpass thematic queries.

Overview of semantic search engines

The essence of semantic search is not only in the questions asked by us. Due to the fact that the web is a set of unstructured HTML pages, the basis of semantic search is also basic information. The most clear and understandable of all we found Freebase- semantic database. Freebase works not only through text search, but most importantly, through MQL (Metaweb Query Language). MQL is almost the same JSON (text-based data exchange format), but with more features. With it, you can compose any query in Freebase and the answer will be the same query, but with search results already inserted.

Powerset , in fact, is a thematic database that works with certain structured information. On the other hand, there is Google, which primarily focuses on the statistical frequency of requests and almost does not take into account semantics. New SearchMonkey Interestingfrom Yahoo! This system does not add anything to the results found, but uses semantic annotations for a more complete, interactive and useful user interface.

Companies Hakia and Powerset is clearly work to their maximum potential. They try to create structures similar to Freebase, and then search for natural language based on top results. The difference is that Hakia (like others) uses the technology to search the entire network, and Powerset has closed its search on Wikipedia.

What is common and where are the differences?

In this regard, the question arises: "Which of these technologies are similar, and which are radically different?" Let's start simple. SearchMonkey is no different from Google and any other search engine, because they have one essence, and the difference is present only in appearance. SearchMonkey is good at letting publishers present search results in the best possible way.

As for Hakia, Powerset and Freebase, then the situation is different. At first glance, they are completely different: Hakia uses the entire web for searches, Powerset uses only Wikipedia and Freebase, and Freebase has two search interfaces: the search bar and the search language. But there is one problem: natural language has nothing to do with the representativeness of basic information.

The fact is that all semantic search technologies allow users to drive in arbitrary complex questions, and then interpret them and apply to existing databases. Hakia, Powerset, Freebase are such databases, and all of them have a system for automatically processing the natural language, which "translates" the question into a standard query that is understandable for the database.

To understand how this works, imagine Freebase and its MQL search language. Unlike a natural language, which allows you to ask a question in different ways, MQL does not imply ambiguity. This JSON-like language allows users to formulate clear queries to search the Freebase database. The fact that Powerset allows you to build questions in a natural language does not mean that Powerset is not a database. Powerset is the base because it is based on the Freebase search string. The difference between Freebase and Powerset lies in the approaches to search and how to present its results.

Back to the future: it's all about the user interface

Perhaps the most important point in semantic search is the user interface. Powerset realized that semantics should be reflected in it. After searching Powerset, a contextual gadget that is familiar with the semantics of the results will help the user complete the entire process.

The weak point of Powerset is the interface. The search box that everyone who has ever searched on the web is familiar with is out of date. The Powerset and Hakia interface that is too simple does not benefit them, but it doesn’t reflect too much on Freebase, which does not position itself as a search engine.

Remember the recent Powerset launch. The company provided the best way to search one of the most powerful sources of information on the web - Wikipedia. But what do critics say? Can this system be called the main competitor of Google? The answer is unequivocal - no.

What if Powerset has some sort of search restriction? What if a different interface was used instead of the search bar, or did the company tell users not to search for what they can easily find on Google? Maybe new companies should improve the search algorithm that has existed for more than 10 years? In any case, any ideas should be aimed at solving problems that Google cannot solve today.

Conclusion

Semantic search is the technology of the future that has set too high goals. We all thought that it would help topple Google and provide the highest quality search results. Both of these claims turned out to be false. The truth is that semantic search is a multifactorial phenomenon, and it will help us solve those problems that we cannot solve now: complex, logically substantiated queries that are often found on the network.

In order for semantic search technologies to occupy their niche in the market, companies need to revise their goals and improve the user interface. The search bar is not relevant and promises losses, because it is associated with simple questions that Google can easily handle. Developers need to offer a completely new interface so that users can fully experience the power of semantic search.

Tags:

Semantic Search: Myths and Reality

Also popular now: