# Lectures of the Technosphere. 1 semester. Introduction to Data Analysis (Spring 2016)

Listen and watch the new collection of lectures by the Technosphere Mail.Ru. This time we are presenting in open access the spring course “Introduction to Data Analysis”, where students will be introduced to the field of data analysis, the basic tools, tasks and methods that any data researcher encounters in his work. The course is taught by Eugene Zavyalov (analyst at the Mail.Ru Search project, engaged in extracting knowledge useful to business from data generated by the search engine and desktop applications), Mikhail Grishin (programmer-researcher from the data analysis department) and Sergey Rybalkin (senior programmer from Allods Team studio) .

In the first lecture, you will learn what data analysis is, what tools are used to analyze data, and how Python works.

A more detailed study of the syntax of the Python language and methods for its use.

The lecture covers both standard Python libraries and the libraries that are most often used for data analysis. There is a story about properties, descriptors, common tasks needed to process data in Python. Evgeny Zavyalov deals with the topic of working with the web, mail and sites.

We will talk about the main approaches to data visualization and explanatory data analysis. Examples of the application of previously acquired knowledge using the analysis of an open dataset will be considered. Work will continue with the libraries Numpy, Pandas. In the same lecture, familiarity with the R language will begin, as a possible alternative to a combination of Python and libraries.

We are talking about the advantages of the R language (see the fourth lecture for disadvantages), which emerged from the academic environment, but came closer to Python in terms of capabilities (and inspired the latter to some borrowings). In the West, the free language R is the de facto standard that is not so widely known in Russia.

Recall the main theorems, probability, distribution laws of random variables, the estimation problem. We will not only touch on fundamental knowledge, but also look at ways of their practical application.

In the second lesson on the topic of statistics, we will consider methods for obtaining estimates, interval estimates, statistical testing of hypotheses, and the very concept of "statistical hypothesis."

Mikhail Grishin continues the theme of the previous lecture: he talks about parametric tests and summarizes the material studied.

The concept of “nonparametric statistics” is given, the difference in the choice of parametric and nonparametric tests (arguments “for” and “against”) is talked about, the story is about nonparametric estimates (bootstrap and nonparametric density estimates).

In addition to multiple hypothesis testing, you will find in the lecture the principal component method, ANOVA (analysis of variance) and, in part, linear regression.

The theme of linear regression, linear algebra, robust regression will continue, the model of autoregression will be considered.

Sergey Rybalkin gives the very basics of the Java language: what is this language for, what are its advantages, how does the language work with what you write on it, basic syntax constructs, comparison with C ++, classes, interfaces, inheritance, and much more.

Second lecture on the basics of Java. The hierarchy of exceptions, collection framework, work with collections, generics, generalization of knowledge gained and the way for further research.

Actual lectures and master classes on programming on mobile and web development are posted on the Technostream channel . If you are interested, study at a university, want to obtain and apply knowledge in the field of development, pay attention to our educational projects: Technopark attached to MSTU. Bauman, Technosphere at Moscow State University Lomonosov, Technotrek at MIPT, Technoatom at MEPhI or come to our online courses .

**Lecture 1. Introduction to Python**

In the first lecture, you will learn what data analysis is, what tools are used to analyze data, and how Python works.

**Lecture 2. Advanced Python**

A more detailed study of the syntax of the Python language and methods for its use.

**Lecture 3. Python libraries for data analysis. Numpy, PyTable, Pandas**

The lecture covers both standard Python libraries and the libraries that are most often used for data analysis. There is a story about properties, descriptors, common tasks needed to process data in Python. Evgeny Zavyalov deals with the topic of working with the web, mail and sites.

**Lecture 4. Visualization, analysis of dataset. EDA**

We will talk about the main approaches to data visualization and explanatory data analysis. Examples of the application of previously acquired knowledge using the analysis of an open dataset will be considered. Work will continue with the libraries Numpy, Pandas. In the same lecture, familiarity with the R language will begin, as a possible alternative to a combination of Python and libraries.

**Lecture 5. R and libraries **

We are talking about the advantages of the R language (see the fourth lecture for disadvantages), which emerged from the academic environment, but came closer to Python in terms of capabilities (and inspired the latter to some borrowings). In the West, the free language R is the de facto standard that is not so widely known in Russia.

**Lecture 6. Introduction to Statistics **

Recall the main theorems, probability, distribution laws of random variables, the estimation problem. We will not only touch on fundamental knowledge, but also look at ways of their practical application.

**Lecture 7. Introduction to Statistical Assessment**

In the second lesson on the topic of statistics, we will consider methods for obtaining estimates, interval estimates, statistical testing of hypotheses, and the very concept of "statistical hypothesis."

**Lecture 8. Parametric statistical tests**

Mikhail Grishin continues the theme of the previous lecture: he talks about parametric tests and summarizes the material studied.

**Lecture 9. Nonparametric tests**

The concept of “nonparametric statistics” is given, the difference in the choice of parametric and nonparametric tests (arguments “for” and “against”) is talked about, the story is about nonparametric estimates (bootstrap and nonparametric density estimates).

**Lecture 10. Multiple hypothesis testing**

In addition to multiple hypothesis testing, you will find in the lecture the principal component method, ANOVA (analysis of variance) and, in part, linear regression.

**Lecture 11. Time Series Analysis**

The theme of linear regression, linear algebra, robust regression will continue, the model of autoregression will be considered.

**Lecture 12. Java: the basics of the language. Part 1**

Sergey Rybalkin gives the very basics of the Java language: what is this language for, what are its advantages, how does the language work with what you write on it, basic syntax constructs, comparison with C ++, classes, interfaces, inheritance, and much more.

**Lecture 13. Java: the basics of the language. Part 2**

Second lecture on the basics of Java. The hierarchy of exceptions, collection framework, work with collections, generics, generalization of knowledge gained and the way for further research.

Actual lectures and master classes on programming on mobile and web development are posted on the Technostream channel . If you are interested, study at a university, want to obtain and apply knowledge in the field of development, pay attention to our educational projects: Technopark attached to MSTU. Bauman, Technosphere at Moscow State University Lomonosov, Technotrek at MIPT, Technoatom at MEPhI or come to our online courses .