Data Science and the Tropics Conference

    Articles about computer vision, interpretability, NLP - we visited the AISTATS conference in Japan and want to share an overview of the articles. This is a major conference on statistics and machine learning, and this year it is being held in Okinawa, an island near Taiwan. In this post, Yulia Antokhina ( Yulia_chan ) prepared a description of the bright articles from the main section, in the next with Anna Papeta she will talk about the reports of invited lecturers and theoretical studies. We will also tell a little about how the conference itself was held and about “non-Japanese” Japan.

    image

    Defending against Whitebox Adversarial Attacks via Randomized Discretization
    Yuchen Zhang (Microsoft); Percy Liang (Stanford University)
    Article
    Code

    Let's start with an article about protection against adversarial attacks in computer vision. These are targeted attacks on models, when the goal of the attack is to make the model make a mistake, up to a predetermined result. Algorithms of computer vision can be mistaken even with minor changes to the original picture for a person. The task is relevant, for example, for machine vision, which in good conditions recognizes road signs faster than a person, but works much worse during attacks.

    Adversarial Attack clearly

    image

    The attacks are Blackbox - when the attacker does not know anything about the algorithm, and Whitebox is the reverse situation. There are two main approaches to protecting models. The first approach is to train the model on regular and “attacked” pictures - it is called adversarial training. This approach works well on small pictures like MNIST, but there are articles that show that it works poorly on large pictures like ImageNet. The second type of protection does not require retraining of the model. It is enough to pre-process the picture before submitting it to the model. Examples of conversions: JPEG compression, resizing. These methods require less computation, but now they only work against Blackbox attacks, since if the conversion is known, the opposite can be applied.

    Method

    In the article, the authors propose a method that does not require overtraining the model and works for Whitebox attacks. The goal is to reduce the Kullback – Leibner distance between ordinary examples and “corrupted” ones using a random transformation. It turns out that just adding random noise and then randomly sampling the colors is enough. That is, an “impaired” image quality is fed to the input of the algorithm, but still sufficient for the algorithm to work. And due to chance, there is the potential to withstand Whitebox attacks.

    On the left - the original picture, in the middle - an example of clustering of pixel colors in Lab space, on the right - a picture in several colors (For example, instead of 40 shades of blue - one)

    image

    Results

    This method was compared to the strongest attacks on the NIPS 2017 Adversarial Attacks & Defenses Competition, and it shows on average the best quality and does not retrain as an “attacker”.

    Comparison of the strongest defense methods against the strongest attacks on the NIPS Competition

    image

    Comparison of the accuracy of the methods on MNIST with different image changes


    image

    Attenuating Bias in Word vectors
    Sunipa Dev (University of Utah); Jeff Phillips (University of Utah)
    Article

    The “trendy” talk was about Unbiased Word Vectors. In this case, Bias means bias by gender or nationality in the representations of words. Any regulators may oppose such "discrimination", and therefore scientists from the University of Utah decided to study the possibility of "equalization of rights" for the NLP. In fact, why can't a man be “glamorous” and a woman a “Data Scientist”?

    Original - the result that is being obtained now, the rest - the results of the unbiased algorithm.

    image

    The article discusses how to find such an offset. They decided that gender and nationality are well characterized by names. So, if you find the offset by name and subtract it, then, probably, you can get rid of the bias of the algorithm.
    An example of more “masculine” and “feminine” words:

    image

    Names for finding gender offsets:

    image

    Oddly enough, such a simple method works. The authors trained the non-biased Glove and uploaded it to Git.

    What made you do this? Understanding black-box decisions with sufficient input subsets
    Brandon Carter (MIT CSAIL); Jonas Mueller (Amazon Web Services); Siddhartha Jain (MIT CSAIL); David Gifford (MIT CSAIL)
    Article
    Once and twice code The

    following article talks about the Sufficient Input Subset algorithm. SIS are the minimum subsets of features, in which the model will produce a certain result, even if all other features are reset. This is another way to somehow interpret the results of complex models. Works on both texts and pictures.

    SIS search algorithm in detail:

    image

    Example of application on a text with reviews about beer:

    image

    Example of application on MNIST:

    image

    Comparison of methods of "interpretation" by Kullback - Leibler distance with respect to an "ideal" result:

    image

    Features are first ranked according to the influence on the model, and then they are divided into disjoint subsets, starting with the most influential. It works by brute force, and on a labeled dataset, the result interprets better than LIME. There is a convenient implementation of SIS search from Google Research.

    Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data
    Victor Veitch (Columbia University); Morgane Austern (Columbia University); Wenda Zhou (Columbia University); David Blei (Columbia University); Peter Orbanz (Columbia University)
    Article
    Code

    In the optimization section, there was a report on Empirical Risk Minimization, where the authors explored ways to apply stochastic gradient descent on graphs. For example, when building a model on social network data, you can use only fixed features of the profile (the number of subscribers), but then information about the connections between the profiles (who is subscribed to) is lost. Moreover, the entire graph is most often difficult to process - for example, it does not fit into memory. When this situation occurs on tabular data, the model can be run on subsamples. And how to choose the analogue of the subsample on the graph was not clear. The authors theoretically substantiated the possibility of using random subgraphs as an analogue of subsamples, and this turned out to be “Not crazy idea”. There are reproducible examples from the article on Github, including the Wikipedia example.

    Category Embeddings on Wikipedia data, taking into account its graph structure, the highlighted articles are the closest to French Physicists on the topic:

    image

    Data Science for Networked Data

    About discrete data graphs, there was another review report by Data Science for Networked Data from an invited speaker Poling Loh (University of Wisconsin-Madison). The presentation covered the topics of Statistical inference, Resource allocation, Local algorithms. In the Statistical inference, for example, it was about how to understand what structure the graph has on the data on infectious diseases. It is proposed to use statistics on the number of connections between infected nodes - and the theorem is proved for the corresponding statistical test.
    In general, the report is more interesting to watch, most likely, for those who are not involved in graph models, but would like to try and are interested in how to correctly test hypotheses for graphs.

    How did the

    AISTATS 2019 conference itself take place - this is a three-day conference in Okinawa. This is Japan, but Okinawa’s culture is closer to China. The main shopping street is reminiscent of such a tiny Miami, there are long cars on the streets, country music, and you step a little to the side - the jungle with snakes, mangrove trees twisted by typhoons. The local flavor is created by the culture of Ryukyu - a kingdom that was located in Okinawa, but first became a vassal and trading partner of China, and then was captured by the Japanese.

    And in Okinawa, apparently, they often hold weddings, because there are a lot of wedding salons, and the conference was held in the premises of Wedding Hall.

    More than 500 people have gathered scientists, authors of articles, listeners and speakers. In three days you can have time to talk with almost everyone. Although the conference was held "at the ends of the world" - representatives from all over the world arrived. Despite the wide geography, it turned out that the interests of all of us are similar. It was a surprise for us, for example, that scientists from Australia solve the same Data Science problems and the same methods as we in our team. But, after all, we live on almost opposite sides of the planet ... There were not so many participants from the industry: Google, Amazon, MTS and several other top companies.

    There were representatives of Japanese sponsoring companies, who mostly watched and listened, and probably hunted someone, despite the fact that “non-Japanese” working in Japan was very difficult.

    Articles submitted to the conference on the topics:

    image

    Everything else - in our next post. Do not miss!

    Announcement:

    image

    Also popular now: