Top 10 Results in Algorithms for 2012
- Transfer
Each year, December 31, David Eppstein publishes a review of preprints over the past year on data structures and algorithms published on arxiv.org . On the links you can get acquainted with the materials for 2010 and 2011 ( my translation ) years.
The cs.DS section is developing at a good pace: this year 935 preprints on algorithms and data structures appeared, while in 2011 there were 798. The section does not reach hundreds per month, although in July (98 preprints) this threshold was very close.
This is my personal list of a dozen preprints that I find particularly interesting. As usual, I don’t put in it my own works and some others that I wrote about before. In addition, there are no results (for example, a faster algorithm for finding the maximum flow ) that did not appear on arxiv.org.
Here they are, in chronological order:
Calculating PageRank for sublinear time . Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng. arXiv: 1202.2771 and WAW 2012 (9th Workshop on Algorithms and Models for the Web Graph). This algorithm is not much faster than a linear one (it allows you to find all pages whose pagerank is larger than Delta beyond O (n / Delta)), but it seems interesting to me to be able to prove such estimates in a natural model of calculations for page graphs linked by hyperlinks.
Finding the “most dishonest” coin for the least amount of flips. Karthekeyan Chandrasekaran and Richard Karp , arXiv: 1202.3639 . The problem solved in this article is very fundamental, and the result is an optimal solution, instead of the previously known approximations, up to a constant coefficient.
Fast universal algorithm for hashing strings. Owen Kaser and Daniel Lemire, arXiv: 1202.4961. An algorithm for hashing strings that uses 0.2 clock cycles of processor time to process one byte and at the same time has theoretically guaranteed characteristics.
Well-known algorithms for solving EDGE CLIQUE COVER are probably optimal. Marek Cygan , Marcin Pilipczuk and Michał Pilipczuk ,arXiv: 1203.1754 and SODA 2013 . There is a simple FPT algorithm for covering the edges of a graph with a minimum number of clicks, the operating time of which, however, is proportional to 2 2 o (k) poly (n), where k is the number of clicks and n is the number of vertices in the graph. Undoubtedly , the task of improving this assessment has been quite a long time. Last year, it was shown that a better kernelization of the task cannot be achieved, but this preprint shows that such an assessment is optimal regardless of whether we use kernelization or not. At least provided that the exponential time hypothesis is true .
Approximability of the solution of the vertex coverage problem in power lawgraphs. Mikael Gast and Mathias Hauptmann, arXiv: 1204.0982 . It seems to me that there should be more work on algorithms that use the properties that page graphs on the Web or users on social networks have. This algorithm is a good example: it improves the achievable degree of approximation for the vertex coverage problem for any graph in which the distribution of degrees of vertices in which obeys a power law.
Clustering is difficult only when it is not needed . Amit Daniely, Nati Linial and Michael Saks, arXiv: 1205.4891. As in the previous article, this describes the step from the complexity of the problem in the general case to the assessment of complexity for some more characteristic instances of the problem. The main idea is that only data that is difficult to cluster in principle is difficult for clustering algorithms. The definition of “good clustering” given in this article seems to be very sensitive to data with a lot of outliers or interference, but this may well be the subject of further study.
A randomized parallel algorithm for solving a system of linear equations of size nxn for O (n 2 ) . Joerg Fliege, arXiv: 1209.3995 . The real breakthrough here lies in the algorithm for solving systems of equations over finite fields, created by Prasad Raghavendra and described earlierlast year on the Richard Lipton blog. This article describes how to apply it to real numbers.
Improving the estimation of a deterministic solution of a dynamic graph connectivity problem. Christian Wulff-Nilsen, arXiv: 1209.5608 and SODA 2013. A data structure is presented that supports the execution of update requests (creating or deleting edges) for the amortized time O (log 2 n / log log n) and requests of the form “Is there a path between vertices in the graph? u and v ”for O (log n / log log n) in the worst case, where n is the number of vertices in the graph. (Eppstein had a funny pun - the previous best estimate for a graph update request was O (log 2 n) and this result was just log shaving (actually, log log shaving)).
8/5-approximation of the solution of the traveling salesman problem . András Sebö, arXiv: 1209.3523 and IPCO 2013 . This article is about a modification of the traveling salesman problem, in which the starting and ending points of his journey do not necessarily coincide. I told students about the Christofides algorithm enough times to know the difference between the path and cycle problem, but the fact that the approximation coefficient for estimating the path length is significantly greater than for estimating the cycle length somehow passed me by.
The k-median approximation by pseudo-approximation. Shi Li and Ola Svensson, arXiv: 1211.0243. The task of k-medians here is to find exactly k cluster centers, minimizing the sum of the distances from each point to the center of the cluster to which it belongs, the approximation of this problem is to find k centers, minimizing the objective function with some accuracy, and the pseudo-approximation is to find approximately k centers. As the authors write, adding even one additional point significantly changes the form of the objective function, so the fact that the pseudo-approximation is applicable to this problem is quite surprising.
The cs.DS section is developing at a good pace: this year 935 preprints on algorithms and data structures appeared, while in 2011 there were 798. The section does not reach hundreds per month, although in July (98 preprints) this threshold was very close.
This is my personal list of a dozen preprints that I find particularly interesting. As usual, I don’t put in it my own works and some others that I wrote about before. In addition, there are no results (for example, a faster algorithm for finding the maximum flow ) that did not appear on arxiv.org.
Here they are, in chronological order:
Calculating PageRank for sublinear time . Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng. arXiv: 1202.2771 and WAW 2012 (9th Workshop on Algorithms and Models for the Web Graph). This algorithm is not much faster than a linear one (it allows you to find all pages whose pagerank is larger than Delta beyond O (n / Delta)), but it seems interesting to me to be able to prove such estimates in a natural model of calculations for page graphs linked by hyperlinks.
Finding the “most dishonest” coin for the least amount of flips. Karthekeyan Chandrasekaran and Richard Karp , arXiv: 1202.3639 . The problem solved in this article is very fundamental, and the result is an optimal solution, instead of the previously known approximations, up to a constant coefficient.
Fast universal algorithm for hashing strings. Owen Kaser and Daniel Lemire, arXiv: 1202.4961. An algorithm for hashing strings that uses 0.2 clock cycles of processor time to process one byte and at the same time has theoretically guaranteed characteristics.
Well-known algorithms for solving EDGE CLIQUE COVER are probably optimal. Marek Cygan , Marcin Pilipczuk and Michał Pilipczuk ,arXiv: 1203.1754 and SODA 2013 . There is a simple FPT algorithm for covering the edges of a graph with a minimum number of clicks, the operating time of which, however, is proportional to 2 2 o (k) poly (n), where k is the number of clicks and n is the number of vertices in the graph. Undoubtedly , the task of improving this assessment has been quite a long time. Last year, it was shown that a better kernelization of the task cannot be achieved, but this preprint shows that such an assessment is optimal regardless of whether we use kernelization or not. At least provided that the exponential time hypothesis is true .
Approximability of the solution of the vertex coverage problem in power lawgraphs. Mikael Gast and Mathias Hauptmann, arXiv: 1204.0982 . It seems to me that there should be more work on algorithms that use the properties that page graphs on the Web or users on social networks have. This algorithm is a good example: it improves the achievable degree of approximation for the vertex coverage problem for any graph in which the distribution of degrees of vertices in which obeys a power law.
Clustering is difficult only when it is not needed . Amit Daniely, Nati Linial and Michael Saks, arXiv: 1205.4891. As in the previous article, this describes the step from the complexity of the problem in the general case to the assessment of complexity for some more characteristic instances of the problem. The main idea is that only data that is difficult to cluster in principle is difficult for clustering algorithms. The definition of “good clustering” given in this article seems to be very sensitive to data with a lot of outliers or interference, but this may well be the subject of further study.
A randomized parallel algorithm for solving a system of linear equations of size nxn for O (n 2 ) . Joerg Fliege, arXiv: 1209.3995 . The real breakthrough here lies in the algorithm for solving systems of equations over finite fields, created by Prasad Raghavendra and described earlierlast year on the Richard Lipton blog. This article describes how to apply it to real numbers.
Improving the estimation of a deterministic solution of a dynamic graph connectivity problem. Christian Wulff-Nilsen, arXiv: 1209.5608 and SODA 2013. A data structure is presented that supports the execution of update requests (creating or deleting edges) for the amortized time O (log 2 n / log log n) and requests of the form “Is there a path between vertices in the graph? u and v ”for O (log n / log log n) in the worst case, where n is the number of vertices in the graph. (Eppstein had a funny pun - the previous best estimate for a graph update request was O (log 2 n) and this result was just log shaving (actually, log log shaving)).
8/5-approximation of the solution of the traveling salesman problem . András Sebö, arXiv: 1209.3523 and IPCO 2013 . This article is about a modification of the traveling salesman problem, in which the starting and ending points of his journey do not necessarily coincide. I told students about the Christofides algorithm enough times to know the difference between the path and cycle problem, but the fact that the approximation coefficient for estimating the path length is significantly greater than for estimating the cycle length somehow passed me by.
The k-median approximation by pseudo-approximation. Shi Li and Ola Svensson, arXiv: 1211.0243. The task of k-medians here is to find exactly k cluster centers, minimizing the sum of the distances from each point to the center of the cluster to which it belongs, the approximation of this problem is to find k centers, minimizing the objective function with some accuracy, and the pseudo-approximation is to find approximately k centers. As the authors write, adding even one additional point significantly changes the form of the objective function, so the fact that the pseudo-approximation is applicable to this problem is quite surprising.