How to search and how not to search Google Maps

    I sometimes have to answer this question (see subject), since I am working on an alternative local search service. Google is very vague about where the data comes from. The main sources for the article were our own observations, and this patent application .



    The main misconception is that "Google Maps finds information about companies on the Internet . " This is not entirely true. Information about your company may be on hundreds of indexed web pages, but may not be included in the issuance on Google Maps.


    Unlike a web search that searches for an index of cached web pages, Google Maps contains a structured directory of enterprises. Each enterprise record contains a key-value field with data understandable to the machine. This should allow you to find a “restaurant with a vegetarian menu and pre-order within a radius of 10 km from Kievsky station” , but more often the catalog contains exact values ​​only about the address and phone number.

    Therefore, it’s important not how Google searches in its own directory, but where the information comes from.

    Where does the data in the Google Maps directory come from?


    According to Google, the catalog "combines information from different sources to produce the best result." Sources are divided into two groups:

    Structured and semi-structured - these are data sources that are easy to bring into a key-value that is understandable for the program. Usually this:
    • commercial base of enterprises that are bought
    • websites containing large company directories ; data from these sites is collected by an individual crawler, which, with regular expressions, extracts information from the catalog pages
    • Google Local Business Center where business owners fill in the information themselves
    • KML (and similar) files that are used to display points using the Google Maps API
    • custom cards

    Unstructured - these are indexed websites that may contain information about the company, but the data from them cannot be structured.

    How information is structured


    This process can be described in three main steps:
    1. Key-value data comes from several structured sources.
    2. Data about the enterprise is clustered: values ​​from different sources are compared and accuracy and weight are determined for each.
    3. Structured data is complemented by unstructured *

    *

    Structured data usually contains accurate but scarce information about the enterprise. And this makes it difficult:
    • search ; How to find a "private kindergarten" if the catalog of enterprises does not contain a field on the form of ownership?
    • ranking ; how to determine which "pharmacy" should be the first to issue, if all the data is from the same directory?
    Therefore, when the main fields (name, address, phone number) are defined for the enterprise, a web search is performed upon request:
    название_предприятия+адрес_предприятия
    and the pages found (and most importantly the keywords from the found pages) are associated with company data.

    How does it not work


    A number of examples can be given when the algorithm leads to erroneous results.

    We are looking for a "hostel" and we find the consular department of the USA

    We are looking for a hostel and we find the consular department of the USA

    Reason : Hostel associations sites constantly host lists of embassies and consulates. The consular department got into the catalog from one of the structured sources but was associated with the site hihostels.com.ua

    We are looking for "rental apartments" and find the housing office

    We are looking for rental apartments and find the housing

    Reason: Real estate rental sites host listings of utilities. ZHEK entered the Google directory from one of the enterprise databases, but was associated with the site toprealty.org.ua

    What to do to get the company in the issuance of Google Maps


    It is obvious that no matter how much information about the company is contained in the web, the most important thing is that this information falls into one (and preferably several) structured sources. The problem is that Google does not list the databases and directories from which information is taken. The only known place is Google LBC.

    Total


    Google Maps is not as transparent as Google Web Search:
    • Most users are not aware of how they search for Google Maps
    • Often it is impossible to determine the source of information
    • Sometimes the result does not comply with the principle of "least surprise"

    I think Google could have done better.

    I would be grateful for corrections, additions and comments.

    Sources


    Generating structured information (patent application US 2006/0200478 A1)
    Google's Local Search Patent Application (at SEO by the Sea)
    Local listings: Where do they come from?

    Also popular now: