The future of search: interviews with participants in the European Conference on Information Retrieval

    Last week in Moscow, with the support of Yandex, one of the two most respected world conferences on information retrieval - ECIR 2013 (European Conference on Information Retrieval).

    Especially for Habrahabr, Ilya Segalovich ( iseg ), the technical director of Yandex, briefly explained how important she is; why the fact that it was held with us is of great importance, and what efforts it took us and our co-organizers from the Higher School of Economics to conduct ECIR in Moscow.



    We also took several interviews with the authors of the most interesting articles and speeches, and we asked the chairman of the jury of the Best Paper Awards committee about what were the best articles and why the subjects of these studies are now most important for science and industry. Under cat tomograms of a brain and other interesting.

    Yashar Moshfegi, University of Glasgow


    Let's start with one of the authors of a somewhat unusual article for ECIR - Understanding Relevance: An fMRI Study . Scientists from the University of Glasgow, using magnetic resonance imaging, studied which parts of the brain are activated when it decides whether this or that information is relevant. We asked Yashar Moshfegi to tell us what they managed to find out and how, in his opinion, this could affect the fate of measurements in the field of information retrieval in the future. By the way, for each interview you can include Russian subtitles.

    Understanding Relevance: An fMRI Study





    Decoding for those who prefer to read
    Tell us a little about what your research is about.

    The objective of our study was to find areas of the brain that respond to clearly relevant information. By it we understood the actual definition of relevance. We tried to see which parts of the brain respond to information that is rated as relevant and irrelevant, and how these reactions differ. Over the past forty years, a lot of research has been done at Information Retrieval and Information Science to understand which information is considered relevant.

    The reason is that relevance is a human assessment. And, like any human assessment, it is difficult to understand and describe by any definition. But since this is a key concept in information retrieval, it is extremely important to better understand it. And one way to do this is to look into the human brain and see what happens in it. Therefore, we got the opportunity to use a magnetic resonance imager in our study and see what happens in the human brain during the relevance assessment and what parts are involved in this process.

    How can the results of the study be applied?

    There are two possibilities. The first is theoretical. Since research helps us better understand which parts of the brain are activated, it can help to figure out what kind of functionality each of them is associated with. We can better understand what processes take place in a person’s head when he decides whether a document is relevant to him. But there is also a practical application that can give rise to new ways of assessing relevance.

    Speaking of Moscow. Is that how you imagined it? Snow in March?

    Well, I heard a lot about snow, but I didn’t think there would be so much! So, yes, it is very similar to what I saw in the movie.



    Mark Nyork, Microsoft Research


    Mark has been working in the information search industry for several decades. He was one of those who participated in the development of the first popular Internet search engine - AltaVista . Mark is now Principal Researcher at Microsoft Research .

    At ECIR 2013, he participated in Industry Day and talked about his vision of when social data could help in the search results, and in which not. We, in turn, talked with Mark about the past and future of the search, the main trends that he sees, as well as which areas will be the most important and interesting in Information Retrieval:


    Decoding for those who prefer to read
    As far as I know, you have been searching for a very long time. Could you tell us where you started?

    I began to search in the late 90s. He worked at Compaq Computer Corporation, in which AltaVista was just developed. He was engaged in search robots, which later began to be used in it.

    You are amazed when you see how quickly the web has grown, what scale it has gained, how search engines have dealt with it. I remember when AltaVista was launched, in my opinion, with 20 million pages in the index. Today, large search engines such as Google, Bing or Yandex indexed about tens of billions of pages. The scale of growth is a thousandfold. And, I think, this growth will not stop for a long time.

    I think the main task in the last ten years has been to integrate into the search the information that users themselves are increasingly creating. If you look at how the web search began, at the first search engines like Excite and AltaVista, you will see that they used traditional information search tools. That is, they tried to understand how well indexed web pages respond to search queries.

    Google's innovation was that they began to consider whether there were links to a web page elsewhere. The next trick that the largest search engines began to use, including Yandex, Google, Bing, was the analysis of user behavior. For this, queries, clicks, data about how exactly a person views pages were used. So users themselves have become an important link in information search, search on the Internet.

    Increasingly, vertical searches are being integrated into it. When you, for example, look for a restaurant, the search engine even today shows you its menu, opening hours, reviews, location. The same goes for airline searches. If you are looking for a flight, the search engine will show you, among other things, that the desired flight is delayed by half an hour. Starting to consider the different vertical search scenarios is one part of this step.

    There is a more general solution. Note that all the scenarios mentioned meant an answer without having to follow the link. You enter a request and immediately get a response. There is a movement towards generalizing this practice to other areas. So that the search engine does not just point you to relevant documents, but download them to your mind and synthesize the answer. This is possible for any request in which ... this is already a conversation about factoids. If you ask a Yandex profit request, the search engine could give you a ready-made answer based on the five articles that mentioned the size of this profit.

    What do you think will be the most interesting area of ​​information retrieval in the next five years?

    Oh, a tough question. I think a better understanding of semantics and meaning in documents. Perhaps we will stop treating them like bags of words. We will extract the structure and meaning from the pages.




    Mor Naaman, Rutgers University SMIL, Mahaya, Inc.

    Mora's story opened the conference. He is currently developing startup Mahaya.co . The service aggregates social data and tries to help look through them at events in which many people were involved, from different angles. Sometimes in the literal sense:


    Decoding for those who prefer to read
    I really love the IR community. And although I did not publish at the conferences on information retrieval, but in what I do, I have a lot of overlap with it. I know a lot of people whose research topics echo my work. Interest in social media, which, obviously, excite me, is growing. And, I think, it will be very important to understand what information search tools will be useful for working with social data.

    My presentation was about how social media is changing the way we see and understand the world. Especially when it comes to events - everything that is happening is now being documented by social media. You can constantly see people taking pictures and tweeting something. And thanks to this, we have a record of the life of society and culture, which was not available earlier. I talked about the different tools that are needed to realize all this information. How can we collect, find, organize, present and save it in a more accessible form. So that we can record the world in such a way as to interact with what happened.

    In general, my presentation is about social media and how they document the world, how we do it ourselves and how we can help people understand this.


    Mora's presentation can be viewed on SlideShare and the video.



    Paul Ogilvy, LinkedIn


    Information Retrieval is not only a search. LinkedIn Paul Ogilvy understands this more than many others. As part of Industry day, he talked about how you can evaluate the quality of the proposed search when conventional metrics like Cranfield style evaluations or A / B testing methods are not quite applicable:


    Decoding for those who prefer to read
    Tell us a little about your presentation, please.

    I will talk about how many details of the problem can be lost in the tasks of information retrieval with the assessment methods that are now commonly used. For example, dimensions based on static collections. As a result of this, sometimes we solve the wrong problem. This is because we do not have the required data types to collect all the details. I give some examples of things that we skip when we work with traditional collections. And some examples of what data can be collected and which metrics to use in order to prevent some common distortions.

    We are busy with very applied tasks. We do not have pure research groups. Everyone involved in research also works on production systems. And makes sure that everything that we come up with and study is based on the problems that we actually encounter. And one of the biggest problems we encountered on LinkedIn is that when we try to evaluate quality, we may not have enough measurements to predict what will happen on real data. So we put a lot of emphasis on understanding this. Because the ability to predict well from correctly collected data helps to develop much faster.




    Arjen de Vries, Chairman of the Best Articles Award Committee


    Summing up the conference, Arjen de Vries, member of the jury of Best Paper Awards and the ECIR 2013 organizing committee, explained what the articles considered the best were, how important they were for the industry, and shared his impression of the conference:


    Decoding for those who prefer to read
    Hello. What can you say about ECIR in Moscow, what are your impressions?

    Well, in my opinion, the conference was very good. It covered a very wide range of topics, very good articles were presented. As you know, I was the head of the committee that selected the best articles. And we could not even choose one - we had to award three prizes. And on topics from controversial interdisciplinary to extremely clear and applied. I really liked the student article from a researcher from Yandex. It is important to pay attention to it - I think it will bring benefits in its field. So, speaking of quality, the conference was very good.

    What else can you say about the best articles? For example, there was a study by Yashar about fMRI. Is this type of research something new for ECIR? It is not only about Computer Science, but also about the structure of the human brain.

    It seems to me that this is the first research in the information search where fMRI scanners were used to understand what happens in the brain in people when they look at images and decide whether they are suitable as an answer to a question or not. It’s hard to say where this will lead. So far, we only know that we can measure something related to relevance, but we don’t know whether any generalization can be made from this. And it will be quite difficult to create a method that could be used without forcing people to lie in a huge tomograph. Nevertheless, as far as I know, this is, in fact, the first work in this direction with clear results. So I'm glad to increase attention to her.

    And about the second best article, if you want. It is exceptional because there is a big problem: companies collect data that they absolutely need in order to make a good search engine. And scientists would like to work with about the same data - to test their hypotheses. But every attempt to openly publish such a data archive is hindered by privacy. And this work dramatically increases the percentage of search logs that can be published, violating anyone's privacy. Moreover, this is done beautifully, using very complex mathematics, which is perfectly applied. With very clear goals and results.



    Links to all the studies discussed at the conference are already available.

    Such a seemingly investigated area, as an information search, finds more and more new incarnations and dimensions. As you know, this is because our life on the Internet is constantly changing and saturated. We are overgrown with communications, data, devices, social networks. The search and help in organizing this information takes on a completely different sound and meaning.

    Also popular now: