HP Vertica, the first launched project in the Russian Federation, experience one and a half years of actual operation

    As an introduction


    The description of HP Vertica was already on Habré and other sources, but, basically, all information was reduced to the theory. Until recently, Vertica was used in real industrial exploitation (as we call it Vertika, I propose to appoint a female) in the States and a little in Europe, while guys from LifeStreet Media wrote about it in Habré . One and a half years have already passed with Vertica, our data warehouse contains dozens of terabytes of data. A data server processes thousands of requests per minute, many of which contain tens of billions of records. Data loading is ongoing in real time with volumes of about 150 GB per day ... In general, I thought it was worth filling the gap and sharing the thrill of riding on really modern new technologies under BigData.

    To whom it will be useful


    I think it will be useful for developers, architects and integrators who are faced with the tasks of storing and analytically processing big data in terms of volume, content and complexity of analysis. Moreover, Vertica now finally has a sane free full version of Community Edition. It allows you to deploy a cluster of 3 servers and upload up to 1 TB of raw data to the data warehouse. Considering the productivity and ease of deployment of solutions on Vertica, I consider this proposal worthy in order to consider it when choosing a data warehouse for companies with a data volume of 1 TB.

    In one paragraph on how we chose


    Briefly without reason for holivar:
    When choosing a data warehouse server, we were interested in the principles of pricing, high performance and scalability of working with large volumes of data, the ability to load data in real time from many different data sources, the ease of starting a project on your own and the minimum cost of maintenance: Vertica performed best of all for these indicators, beating IBM Netezza and EMC GreenPlum. The latter could not fully satisfy all our requirements. This could result in additional costs for the development and maintenance of our project, which does not have a very large budget.

    What Verica looks like from an architect's point of view


    An architect is the most important person for a data warehouse in Vertica. First of all, the success and productivity of the functioning of the data warehouse depends on it. The architect has two complex tasks: to correctly select the technical stuffing of the Vertica cluster and to correctly design the physical model of the database.

    What is affected by technical architecture

    I give my personal feelings as my criticality decreases:
    1. As an MPP server, Vertica primarily places stringent demands on network architecture. If you have 3 servers in a cluster and a base of 3 TB, then the data exchange between them and users is not particularly noticeable in network costs. If you have 6 servers and a database of 30 terabytes of data, then at peak loads the network may become the weakest point of the entire cluster operation. And imagine that you have 100 servers and a database of more than 1 petabyte? Fearfully? :)
    2. Correctly assembled arrays on local disks of cluster servers guarantee successful balancing between performance, storage volume and cost of iron. Raid RAID on cheap slow disks, but with a large volume? Cheap and cheerful, but slowly. Gathered everything on fast disks, but with less volume? Expensive, fast, but not enough space for the database, you still need to connect the server to the cluster. We read the documentation on Flex technology in Vertica, hung fast and slow disks on the cluster servers, distributing the storage of table columns (projections) by their participation in queries as filters and returned data? Well done, got the optimal cluster for work. If you add local fast disks to the server and take out the Vertica temp space there, everything will look super.
    3. The secret is open: there is never much memory and it’s cheaper to immediately buy a server with a large amount of RAM than to deliver it to existing servers later. It is worth remembering that analytical queries to billions of data records want a lot of memory, no one canceled physics. Easily one session of one analyst can eat up to 20 gigabytes of memory. Count your analysts, count the difficult requests that your ELT will fulfill and do not start to panic and think about buying a terabyte of memory. Vertica can successfully balance resource utilization loads with resource pools. You can focus on the estimated amount of data storage, divided by the number of servers in the cluster, taking into account the estimated load,
    4. What always surprises me at Vertica after Sybase IQ is its low CPU load. If IQ during the execution of heavy queries, all processors are working in full-time, then with Vertica under the same conditions they look like they are relaxing on holiday against Sybase. Moreover, both servers are column-oriented, both compress all data when stored in the database. Miracles, and only that, but the fact remains that Vertica does not have the most critical place, although I hope you do not take this as a guide to the fact that 2 cores are enough. There are no miracles: as elsewhere, running queries take up kernels, a heavy query can parallelize its execution across multiple threads and take up many cores. Therefore, we select the number of cluster server cores, focusing on the expected load.


    What is affected by the physical database model

    It is difficult to put down the level of criticality, everything is important, therefore the list is not numbered:
    • Against the background of Sybase IQ, Vertica surprisingly effectively works with JOIN even of large tables, but nevertheless column-oriented DBMSs were originally oriented to try to get rid of excess connections and simplify data schemes. Wherever possible, use data denormalization. This does not increase the cost of storing data, but it speeds up the execution of queries and also simplifies their writing, removing unnecessary tables from the query. The downside here is that the denormalized data in the tables takes up a place in the license. But, if you think logically, the output to the fact table of the name and price of the goods will take an average of 30 bytes per record, that is, a billion records in facts will result in a 30 GB license. God knows how much to save bytes.
    • I used to think that partitioning tables wasn’t the most important thing when designing a data schema - it was like that on older versions of Vertica. But now looking at the new versions of Vertica, I stopped thinking like that. Partitioning allows you to split table data into logical containers that are easier to manipulate with Vertica for storing and accessing data. The verdict of Vertica developers is harsh: there should not be many keys of partitions and there should not be few. In the first case, Vertica shuts up when performing requests by revising the clouds of partition containers. In the second case, Vertica will spend a lot of time merging and reading hundreds of gigabyte-sized containers on disk. Also, when planning partitions, it is worth remembering that Vertica can quickly and efficiently delete, as well as transfer containers to other tables by partition keys, and you need to use this. The conclusion from all this is simple: you do not need to make containers according to logical signs, such as for example cities or customers. Ask yourself: will you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. no need to make containers according to logical signs, such as for example cities or customers. Ask yourself: will you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. no need to make containers according to logical signs, such as for example cities or customers. Ask yourself: will you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. such as cities or customers. Ask yourself: will you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. such as cities or customers. Ask yourself: will you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. Do you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. Do you have a reason to delete or transfer to the archive table from the facts all the information about the city or the client? But deleting or transferring information for the past period, day, week or month is a good reason to think about the key by date for partitioning the fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. a week or a month is a good reason to think about a key by date for partitioning a fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. a week or a month is a good reason to think about a key by date for partitioning a fact table. For those who believe in the future of their company and the long happy life of the data warehouse and do not agree that the data does not need to be stored forever, Vertica has a wonderful MERGE_PARTITIONS function. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there. It allows you to connect the specified interval of partition keys into a single container and keep the number of partition keys of the table within a reasonable limit. I hope about measurements, and so it is clear: partitions are definitely not needed there.
    • Any MPP server, whether Vertica, Hadoop or Cassandra, loves evenly distributed data across the cluster servers. Any car enthusiast will confirm that poor balance is immediately visually felt when driving. MPP is not an exception here - if you specify segmentation by city as data storage segmentation, then on one server of the cluster you will have 15 millionth Moscow, and on the other 40 thousandth Uryupinsk. I think it’s already clear which server will work on requests, while the rest of the cluster server will simply be idle and wait for the “Moscow” server to complete its work. If segmentation is not explicitly indicated, Vertica simply segments by hash of all table fields by default. This gives an even distribution of data between the cluster servers, but there are times when you can thoughtfully do it manually. This may make sense in cases of frequent use in data grouping requests for certain fields, provided that the records for these grouping fields are always approximately uniformly distributed. For example, we have about the same number of sales facts for each city for every day. Of course in Moscow it will be tens of thousands of records per day, and in Uryupinsk it will give out just a dozen sales. Since there are many cities in Russia and days a year too, when determining the segmentation of the fact table by the hash of the day of sale and the city, the records of the same Moscow will evenly fit between the cluster servers by days, and Uryupinsk will be just a statistical error in balancing the distribution of data in such a table also laying out between the cluster servers. What do we get with this order of segmentation? A query with grouping by day and city will work more optimally than with segmentation by all fields. It will be enough for each of the servers to aggregate data by city and day and transfer the final results to the server initiator, which will return the result of the session. Otherwise, each of the servers would pick up data, partially aggregate what it has and transfer it to the initiator for further aggregation. When is such a game worth the candle? My opinion is only if there will be requests for data in which thousands of cities have large sales periods. That is, in this example, it is clearly not worth doing such segmentation, and it is enough to make the usual uniform segmentation of data in the sales table. But as they say, he’s an example to just show, and not do. With segmentation of measurements, everything is also simple - for measurements with a reasonable number of records, it is easier to make mirror storage of all copies on servers, for the rest, an even distribution of data between nodes. Here it is worth remembering that when mirroring the measurement, the costs of each server in the cluster when connecting the fact table to the measurement will only go to just get a copy of the measurement records in your memory and start the connection. Otherwise, you will have to get pieces of measurement from each server and then connect them. Here, the network expenses for the mirror are less according to the principle - everyone received from one, than everyone received from everyone. that when mirroring the measurement, the costs of each server in the cluster when connecting the fact table to the measurement will only go to just get a copy of the measurement records in your memory and start the connection. Otherwise, you will have to get pieces of measurement from each server and then connect them. Here, the network expenses for the mirror are less according to the principle - everyone received from one, than everyone received from everyone. that when mirroring the measurement, the costs of each server in the cluster when connecting the fact table to the measurement will only go to just get a copy of the measurement records in your memory and start the connection. Otherwise, you will have to get pieces of measurement from each server and then connect them. Here, the network expenses for the mirror are less according to the principle - everyone received from one, than everyone received from everyone.
    • One of Vertica’s strengths with marketers is that Vertica has no indexes. That's right, there are no indexes, but there is a sorting and encoding format for data storage. In fact, it directly depends on these things how efficiently your table or its projection will be ready to fulfill various queries and which of the projections will be most liked by the query optimizer when building a query execution plan. Honestly, the description of all the nuances of how to better code and sort the fields in the projections attracts a large dissertation on the subject of Voodoo. Therefore, I will limit myself to simple design rules in the recommendations:
      1. When choosing a method for storing column values, Vertica automatically focuses on its data type. Since it is not possible to predict by any algorithm that BIGINT is in some case a fact value, but in some dimension’s identifier, it is worthwhile not to ignore the ENCODING indication of the part of the fields that will be used for filtering and aggregation in queries when describing tables. is is not facts, but the value of dimensions. With the correct description of this column option, Vertica will be easier to store and search for data on them, and the optimizer should be taken into account when building more efficient queries. Also, do not forget about GROUPING, if there are fields that are always returned and processed together in requests, there is a good reason to combine their storage in one place, reducing the cost of reading and collecting records.
      2. When assigning sorting to a table or projection, remember that for all cases sorting you will not stock up and you will not be safe from creating projections. Therefore, for the table, we select sorting from the point of view of the most frequently executed queries, and for the remaining cases we already make projections.
      3. When choosing fields and sort order, we focus on the following rules: at the beginning of the sorting we put the columns that are most often searched for in equality queries and which are most suitable for identifying unique ones, then in the middle of the sorting it’s nice to put the fields that are used in connections or search list operator IN. The last one is to put the fields along which the comparison operations go. In the case of our long-suffering sales table, from the point of view of sorting, it would be most beneficial to put the following order of fields: Region, Client, Sales_Date. This sorting order covers all queries that have a filter by region and / or client in the specified section of the date period. If these queries use sort fields in ORDER BY, Vertica is even easier to execute the query.
      4. Since it is still impossible to be completely sure that encoding and sorting, as well as segmentation, will be the most optimal, you can always make a knight's move. Namely: deploy a prototype table on Vertica without explicitly specifying segmentation and sorting, fill it with a certain amount of data, write more typical queries to this table and run Vertica adminTools utilities through Database Designer. The designer will analyze the queries, create the necessary projections for the table with the encoding and data sorting that are optimal from his point of view. Then it will be possible to drive queries on this table, see their query plans, evaluate how effective the proposed projections are and make a worksheet, focusing on the sorting, segmentation and coding of fields proposed in the created projections.


    What Vertica looks like from an ETL / ELT developer perspective


    The ETL logic developer lives easily and at ease: Vertica has all the standard data access drivers (ODBC, JDBC, .NET, Python). There is also an impressive set of native proprietary tools for batch downloading flat files of the COPY command. They can be expanded with your own parsers, filters and validators. Files can be downloaded both from the cluster servers themselves and locally from working machines using COPY LOCAL. Vertica JDBC drivers support batch insertion through JDBC prepared statement, automatically converting paste-value packets to COPY batch insertion. This gives a high speed of inserting records. The only fly in the ointment is that the expansion functions of batch downloads can only be written in C, which immediately complicates the development. Judging by the latest rumors, Vertica is confidently moving towards integration with the Java world (apparently closer to Hadoop), so it is quite possible in the near future that such things can be written and connected to Java. Touching upon issues of productivity and efficiency of parallel loading of large amounts of data, Vertica with its architecture completely takes them over. The ETL developer does not require any special knowledge of the nuances of real-time loading and load balancing.

    From the point of view of the ELT logic developer, there are only two complaints to Vertica: there is no support for the stored procedure language and there is no way to write your own functions besides primitives in the form of expressions or in C language. The last Vertica team promises to fix it soon in the form of support for functions in Java, but no one has promised so far with stored procedures. So you have to store and execute ELT logic using your ETL tools. For the rest, Vertica fully satisfies even the most demanding script developer: full support for the ANSI SQL standard, many functions familiar from other data servers, simple work with data types and their reduction, support for local and global session temporary tables for storing intermediate results, the presence of an operator MERGE, support for fast UPDATE and DELETE over large data sets, expanding OLAP functionality through support for time intervals (TIME EVENT), time connections (TIME JOIN) and much much more. All this makes it easy to write complex queries on the transformation and filling of data from the staging area into the window display area, calculate aggregates and KPIs, and do other work to calculate data in the warehouse. As practice has shown, developers who know Oracle, MSSQL or Sybase IQ DBMS quickly find a common language with Vertica at the level of complex queries. For Oracle developers, the lack of a language for stored procedures and cursors is perhaps an additional incentive to change their paradigm in approaches to developing data calculation logic. Perhaps this is one of the reasons

    What Vertica looks like from a BI developer perspective


    From the point of view of the developer of BI solutions, Vertica clearly lacks the possibility of some analogue of parameterized views, for example, table functions or stored procedures in Sybase IQ in MSSQL. When developing complex queries for BI, often filtering data by given parameters is used somewhere there, inside subqueries. And on top, already over the results of subqueries, data is aggregated. The inability to make queries in Vertica, such as saved queries with parameters, force BI developers to copy complex queries between different universes, extracts, and other things that anyone has in BI tools. Otherwise, as in the case of ELT developers, Vertica fully covers the entire range of analytical queries of any complexity. That is, it has no restrictions in SQL queries, there is support for functions and extensions OLAP, WITH, TIME SERIES, TIME JOIN, EVENT SERIES, work with geo-data, ip addressing, urls, etc. The Vertica repository metadata itself is remarkably visible in BI and there are no difficulties with Vertica for all major BI products. Here it is worthwhile for those who are interested in the interaction of Vertica with BI to pay attention to such a wonderful combination as Vertica + Tableau. Together, these two products deliver powerful analytics directly in real time. I will say that Tableau was appreciated in our company. I believe that the main issues for BI developers, namely the performance of ad-hoc requests and functionality limitations, are simply absent in Vertica, which positively affects the speed and quality of developing BI solutions.

    What Vertica looks like from an administrator’s point of view


    Administration of Vertica can conditionally be considered null. This does not mean that Vertica does not need administration. It is understood that administration of Vertica is required occasionally, as needed. Moreover, instead of a dedicated full-time permanent administrator unit, remote server administration or administration by an architect, ETL or BI developer is quite possible. The server administration itself can be conditionally divided into a number of categories:
    • Manage roles and users. The standard process is the description in the user database of their distribution by roles and description of access roles to database objects. The work is not frequent, it is done as users and roles are added.
    • Cluster load management. A more complex process requiring a presentation of the Vertica server architecture. It requires analysis of the current load on the cluster of various processes and user groups to optimally distribute Vertica server resources across resource pools. Using resource pools, you can categorize query execution, select a different description of resource use according to their needs, specify the hot reserved memory size, the maximum amount of memory consumed, the number of competing connections, the priority of receiving resources, the maximum allowable query execution time, as well as restrictions on CPU utilization. The competent development of resource pools and the distribution of different users among them ensures balanced cluster operation even in cases of peak loads. This allows you to effectively distribute the execution between tasks in real-time categories, operational reports and long analytical queries. Typically, such work is already carried out on a production server, with established loads and an idea of ​​when and how peak server loads occur. As long as the load conditions do not change, it makes no sense to change the resource pools.
    • Manage cluster servers. The process of adding new servers to the cluster, replacing or removing servers from the cluster with subsequent rebalancing of the data is not complicated and is carried out through the utility, however, it requires a clear understanding of the server architecture and work planning so that a partial drop in cluster performance during such work is not coincided with other costly work in the data warehouse. For example, one of such tasks at the moment may be overloading a large amount of data from a table to a table or loading from an external source. For example, in our practice, adding 3 more servers to 3 servers in a cluster with data rebalancing between them took about a day with a slight loss in performance, which, as a result, none of the users noticed.
    • Cluster recovery. If one of the cluster servers crashes, the Vertica administrator can restart this server if the server is physically operational and sees other servers on the network or replace it with another one if the server fails. At the time of disconnecting the server from the cluster, there is a partial decrease in the performance of the data warehouse due to the fact that another server in the cluster was forced to take over the work of the stopped server. The administrator has utilities that allow you to start or replace a failed server, Vertica takes care of the further work on automatic data recovery on the server and its inclusion in the work. If the hardware of the servers is reliable, then this work is not frequent - in our practice, Vertica only stopped one of the cluster servers twice. First time because that there was a failure when working with RAM, the second time when the RAID controller failed. In both cases, the cluster did not stop working while technical work was being done with the servers. Users and data downloaders continued to work in the normal mode of working with the data warehouse.
    • Upgrade server version. It is performed by downloading the distribution kit to one of the Vertica servers, temporarily stopping the Vertica server within 10 minutes, starting the upgrade installation and restarting the Vertica server. Any administrator who can upload files to Linux and run programs can handle this job.
    • Query Optimization. If there are times when certain queries work too slowly, then maybe it's time for the table to do another projection. To do this, the administrator can call Database Designer from the Vertica adminTools utility and run problematic queries through it. At the output, the designer will analyze what exactly is missing for the quick life of these queries and give ready-made recipes in the form of projections onto tables. To follow or not to follow these recipes is up to you. Vertica developers claim that their American and European customers do not hesitate to just create the recommended projections and everything is fine with them. I personally double-check what the designer does. Basically, I have no complaints, but sometimes I still think that some of the proposed projections are unnecessary and take up more space on the server’s disks, what will bring benefits to speed up queries. This mainly concerns those cases when the requests do not have to be completed in a couple of seconds and the session will wait a few tens of seconds, until the request is executed without any complaints from the user or the process.


    What Vertica looks like from a leadership perspective


    To begin, I’ll list something that our management never had a headache with an explanation of why:
    1. It is required to quickly add new data sources for analysis: new schemes and tables are added, data loaders are made in them, the work is transparent and understandable.
    2. It is necessary to increase productivity, taking into account the increased workloads from users of the data warehouse: user requests are analyzed, if necessary, new projections are added that accelerate their work. In the case of a multiple increase in load, new servers are bought and connected to the cluster.
    3. It is required to integrate integrators or other units into a number of works, transferring them the tasks of developing data loading or building BI: standard access protocols to Vertica and ANSI SQL allow new developers to start working with the data warehouse as quickly as possible.
    4. It is required to provide storage and analysis of an order of magnitude more data from sources: new servers are bought and connected to the cluster.
    5. Resources for upgrading licenses are required: when buying new servers and connecting them to a cluster to increase performance or data storage area, this does not affect the license for the initial amount of data allowed for downloading. Also, the license does not affect the creation of additional structures (projections) to speed up the execution of queries. Thus, a license upgrade is required only if the amount of source data has reached the volume planned for the purchase of licenses and there is no way to delete obsolete archive data in order to free up space. We have not yet reached this point, but the issue of storing archived old data is more relevant than ever - the data turned out to be much more than we planned due to the increased appetites of the company.


    What our management always liked:
    1. Set tasks quickly and get results just as fast.
    2. At any time, quickly receive analytical reports for any period under any conditions.
    3. See the status of the ongoing processes of our company in real time.
    4. Do not hear any complaints about Vertica or indications that the tasks were not completed for any reason due to Vertica.

    What our management always did not like:
    1. Price. For quality and speed, alas, you have to pay. Vertica is not a free and not the cheapest solution, therefore management needs a moral willingness to spend money, realizing at the same time what they will get from it.


    What Vertica looks like from an integrator's point of view


    In my personal opinion, Vertica is good for those integrators who feed on startups and so-so for those who feed on escorts. Now most of the major integrators have revenues thanks to the support of projects with classic DBMSs and would like to deploy quick and effective startups for the development of data warehousing among their clients: here it seems to me they might like Vertica great. The main problem for them will be that Vertica operates only on Linux platforms. Not every client sitting tight on MS wants to deploy a server on Linux, realizing that they can’t do without proper administration settings for these servers. Although, however, here the integrators just have a chance to get the desired support.

    Summarizing


    The general feelings about Vertica with me, my colleagues and my company over a year and a half of practical operation were very good. No one had any reason for disappointment (they knocked on the table - pah pah). In most cases, faced with this server, our integrators gladly took up projects on it, and some completely focused on Vertica, suspending their work with other similar products (shh ... it's a secret). Many managers and architects of the companies came to visit to see and evaluate what we have. Many experts consulted us remotely when choosing a data warehouse, some of them also chose Vertica as a result and are already developing startups. There were companies which Vertica did not like at all for their moral and political reasons and they decided to opt for other data warehouse servers. In any case, everyone recognized that the Vertica product is damn promising and HP did not fail, having bought Vertica and leaving the entire structure of the company as it is without internal absorption.

    In general, a brief tour of the "big" server is over, thank you for your attention.
    Sincerely, DWH architect Konstantinov Aleksey

    PS Please, when using the materials in this article, refer to this article by Habr or my blog to help protect my copyrights in Russian-language information on Vertica;)

    Also popular now: