Other GitHub: Data Science Repositories, Data Visualizations, and Deep Learning Repositories
( c )
Github is not just a platform for hosting and joint development of IT projects, but also a huge knowledge base compiled by hundreds of experts. Fortunately, the service provides not just tools for working with open source, but also high-quality materials for training. We selected some popular repositories and sorted them by the number of stars in descending order.
This compilation will help you figure out which repositories you should pay attention to if you are interested in working with data and in the field of deep learning.
The Open Source Data Science Masters
Stars: 11,227, Forks: 4,737
The official repository of the Data Science Masters curriculum, developed as an alternative to the open source formal education in the field of Data Science. The repository is a collection of educational materials collected over several years.
Awesome Data Science
Stars: 9,240, Forks: 2,761
A powerful collection that answers the questions: “What is Data Science?” And “What do you need to know in order to understand this science well?”. Conveniently categorized. For example, there is a list of books on Data Science, a selection of infographics, and even thematic groups on Facebook .
Jupyter Interactive Notebook
Stars: 5,242, Forks: 2,313 The
progenitor of this repository is a platform for working with scripts in 40 programming languages Data Science iPython Notebooks , having collected more than 14,000 stars and 4,000 forks. Data processing and machine learning specialists actively used it for scientific computing.
Today, Jupyter Notebook is a handy collection of notepad files consisting of paragraphs in which requests are written and executed. With the help of the built-in visualizers, a notepad with a set of queries turns into a full-fledged dashboard with data.
Data Science Blogs
Stars: 4,510, forki: 1,178
A simple but extensive list of educational materials, sorted alphabetically. Here you will find all popular blogs, as well as many small sites with useful information (a total of 251 resources are listed).
Data Science Specialization
Stars: 3114, forks: 27,184
Repository training course on Data Science at Johns Hopkins University - a very popular course, prepared by Roger Peng, Jeff and Brian Face Cuff. To be more precise, the Coursera data science course at the Coursera includes several interrelated courses on various topics (for example, R Programming) dealing with various aspects of data analysis, and the repository presented in the compilation combines the information used in all courses.
Stars: 2 677, Forks: 587
Learn Data Science
Stars: 2,129, Forks:
1,210 A collection of iPython notebooks, focused on fundamental concepts of machine learning for beginners.
Data Science at the Command Line
Stars: 2,057, Forks: 503
The repository contains the texts, data, scripts, and custom console tools used in Data Science at the Command Line". This practical guide demonstrates how to combine small but powerful command line tools to quickly get, clear, explore, and simulate data.
Data Science Specialization Community Site
Stars: 1,395, Forks: 2,661
Several students who attended the course at Johns Hopkins University created such high-quality content that university staff shared it and also made a catalog for all the interesting content created by the community.
Data Visualization for the Web
Stars: 81,837, Forks: 20,282
Stars: 41 393, fork: 9 294
Chart.js is an HTML5 library that creates visualization through the <Canvas> element. Chart.js positions itself as a simple and flexible tool, interactive, supporting six different types of charts.
Stars: 32 204, fork: 9 369
ECharts is a browser library for graphing and visualization. Easy to use, intuitive and easy to configure.
Stars: 23,810, forki: 3,937
Stars: 8 348, fork: 1 305
JS-library, focused on drawing graphs. Sigma allows you to develop graph views on web pages and integrate them into web applications.
Stars: 6,559, Forks: 702
Vega is a declarative language for creating, saving and sharing interactive visualization projects. With it, you can describe the appearance and interactive rendering behavior in JSON format, as well as create web views using Canvas or SVG. Vega provides basic building blocks for a wide range of visualization projects: loading and transforming data, scaling, map projections, conventions, graphic labels, etc.
Stars: 6,458 , forks: 1,734
DC.js is a multidimensional diagram built on D3.js for working with a crossfilter . DC.js renders in CSS format compatible SVG. Designed for powerful data analysis in the browser and on mobile devices.
Stars: 4,949, fork: 290
Universal library of visualization in real time. It focuses on two different aspects: basic charts for creating historical reports and real-time charts for displaying frequently updated time series data.
Stars: 37 611, forki: 14 344
Keras is a deep learning library in Python that is used in both TensorFlow and Theano (yes, you can run it on top of the TensorFlow , Theano and CNTK libraries ). Keras is designed for rapid experimentation, as the key to conducting good research is the ability to move from idea to result with the least delay. Due to the thorough and accessible documentation, Keras rightfully takes place in our selection.
Stars: 26 892, fork: 16 276
Caffe (Convolution Architecture For Feature Extraction) is a deep learning library linking Python and MATLAB. In essence, this is a general-purpose library designed for deploying convolutional networks and for image, speech, or multimedia recognition.
There is also a project Caffe2, which includes new features, in particular, recurrent neural networks. In May 2018, the teams Caffe2 and PyTorch merged, the Caffe2 code was transferred to the PyTorch repository (stars: 24,075 , forks: 5,707 ).
Stars: 16 157, Forks: 5,824
Data Science IPython Notebooks
14,747, Forks : 4,410 A collection of iPython notebooks, including big data, Hadoop, scikit-learn, libraries designed for scientific computing, etc. If we talk about deep learning, then TensorFlow, Theano, Caffe are covered and other tools.
Stars: 9 510, fork: 1 982
Stars: 10,227, forki: 4,570
Deep Learning Library for Java and Scala. Integrates with Hadoop and Spark. Deeplearning4j also allows for computations on GPUs with CUDA support. In addition, there are tools for working with the library in Python. The repository contains all the necessary documentation and tutorials.
LISA Lab Deep Learning Tutorials
Stars: 3,673, Forks: 2,045
Textbook of the University of Montreal. The materials presented here introduce some of the most important deep learning algorithms, as well as demonstrate the principle of working with Theano. Theano is a Python library that simplifies the writing of deep learning models and makes it possible to train them on the GPU.
This list is not limited to the number of interesting things on Gitkhab. Next time we’ll talk about machine learning projects and open datasets. If you have your own examples of interesting repositories, share them in the comments.