Other GitHub: Data Science Repositories, Data Visualizations, and Deep Learning Repositories

    ( c )

    Github is not just a platform for hosting and joint development of IT projects, but also a huge knowledge base compiled by hundreds of experts. Fortunately, the service provides not just tools for working with open source, but also high-quality materials for training. We selected some popular repositories and sorted them by the number of stars in descending order.

    This compilation will help you figure out which repositories you should pay attention to if you are interested in working with data and in the field of deep learning.

    Data science

    The Open Source Data Science Masters
    Stars: 11,227, Forks: 4,737

    The official repository of the Data Science Masters curriculum, developed as an alternative to the open source formal education in the field of Data Science. The repository is a collection of educational materials collected over several years.

    Awesome Data Science
    Stars: 9,240, Forks: 2,761

    A powerful collection that answers the questions: “What is Data Science?” And “What do you need to know in order to understand this science well?”. Conveniently categorized. For example, there is a list of books on Data Science, a selection of infographics, and even thematic groups on Facebook .

    Jupyter Interactive Notebook
    Stars: 5,242, Forks: 2,313 The

    progenitor of this repository is a platform for working with scripts in 40 programming languages Data Science iPython Notebooks , having collected more than 14,000 stars and 4,000 forks. Data processing and machine learning specialists actively used it for scientific computing.

    Today, Jupyter Notebook is a handy collection of notepad files consisting of paragraphs in which requests are written and executed. With the help of the built-in visualizers, a notepad with a set of queries turns into a full-fledged dashboard with data.

    Data Science Blogs
    Stars: 4,510, forki: 1,178

    A simple but extensive list of educational materials, sorted alphabetically. Here you will find all popular blogs, as well as many small sites with useful information (a total of 251 resources are listed).

    Data Science Specialization
    Stars: 3114, forks: 27,184

    Repository training course on Data Science at Johns Hopkins University - a very popular course, prepared by Roger Peng, Jeff and Brian Face Cuff. To be more precise, the Coursera data science course at the Coursera includes several interrelated courses on various topics (for example, R Programming) dealing with various aspects of data analysis, and the repository presented in the compilation combines the information used in all courses.

    Spark Notebook
    Stars: 2 677, Forks: 587

    Spark Notebook is an open source notepad that provides an interactive web editor that can combine Scala code, SQL queries, Markup, and JavaScript for collaborative analysis and examination of data.

    Learn Data Science
    Stars: 2,129, Forks:

    1,210 A collection of iPython notebooks, focused on fundamental concepts of machine learning for beginners.

    Data Science at the Command Line
    Stars: 2,057, Forks: 503

    The repository contains the texts, data, scripts, and custom console tools used in Data Science at the Command Line". This practical guide demonstrates how to combine small but powerful command line tools to quickly get, clear, explore, and simulate data.

    Data Science Specialization Community Site
    Stars: 1,395, Forks: 2,661

    Several students who attended the course at Johns Hopkins University created such high-quality content that university staff shared it and also made a catalog for all the interesting content created by the community.

    Data Visualization for the Web

    Stars: 81,837, Forks: 20,282

    D3 is a JavaScript data visualization library for HTML and SVG. In D3, the focus is on web standards, so you can use all the capabilities of modern browsers without tying yourself to the proprietary structure, combining powerful visualization components, a guided approach, and interaction with the Document Object Model (DOM) . This is the most popular data visualization project on GitHub.

    Stars: 41 393, fork: 9 294

    Chart.js is an HTML5 library that creates visualization through the <Canvas> element. Chart.js positions itself as a simple and flexible tool, interactive, supporting six different types of charts.

    Stars: 32 204, fork: 9 369

    ECharts is a browser library for graphing and visualization. Easy to use, intuitive and easy to configure.

    Stars: 23,810, forki: 3,937

    A javascript library for creating interactive maps that are focused on mobile use. The library code is incredibly small - it is designed for simple, fast and convenient use. Leaflet functions can be extended through a set of plug-ins.

    Stars: 8 348, fork: 1 305

    JS-library, focused on drawing graphs. Sigma allows you to develop graph views on web pages and integrate them into web applications.

    Stars: 6,559, Forks: 702

    Vega is a declarative language for creating, saving and sharing interactive visualization projects. With it, you can describe the appearance and interactive rendering behavior in JSON format, as well as create web views using Canvas or SVG. Vega provides basic building blocks for a wide range of visualization projects: loading and transforming data, scaling, map projections, conventions, graphic labels, etc.

    Stars: 6,458 , forks: 1,734

    DC.js is a multidimensional diagram built on D3.js for working with a crossfilter . DC.js renders in CSS format compatible SVG. Designed for powerful data analysis in the browser and on mobile devices.

    Stars: 4,949, fork: 290

    Universal library of visualization in real time. It focuses on two different aspects: basic charts for creating historical reports and real-time charts for displaying frequently updated time series data.

    Deep learning

    Stars: 37 611, forki: 14 344

    Keras is a deep learning library in Python that is used in both TensorFlow and Theano (yes, you can run it on top of the TensorFlow , Theano and CNTK libraries ). Keras is designed for rapid experimentation, as the key to conducting good research is the ability to move from idea to result with the least delay. Due to the thorough and accessible documentation, Keras rightfully takes place in our selection.

    Stars: 26 892, fork: 16 276

    Caffe (Convolution Architecture For Feature Extraction) is a deep learning library linking Python and MATLAB. In essence, this is a general-purpose library designed for deploying convolutional networks and for image, speech, or multimedia recognition.

    There is also a project Caffe2, which includes new features, in particular, recurrent neural networks. In May 2018, the teams Caffe2 and PyTorch merged, the Caffe2 code was transferred to the PyTorch repository (stars: 24,075 , forks: 5,707 ).

    Stars: 16 157, Forks: 5,824

    Lightweight, compact, flexible distributed learning environment for Python, R, Julia, Scala, Go, JavaScript, etc. For greater performance, MXNet allows you to mix imperative and symbolic programming methods. The project also contains guidelines for creating other deep learning systems.

    Data Science IPython Notebooks

    14,747, Forks : 4,410 A collection of iPython notebooks, including big data, Hadoop, scikit-learn, libraries designed for scientific computing, etc. If we talk about deep learning, then TensorFlow, Theano, Caffe are covered and other tools.

    Stars: 9 510, fork: 1 982

    ConvNetJS is an implementation of neural networks and their common JavaScript modules. The project is currently not supported, but still worthy of attention. Allows you to train the convolutional (or regular) network directly in the browser.

    Stars: 10,227, forki: 4,570

    Deep Learning Library for Java and Scala. Integrates with Hadoop and Spark. Deeplearning4j also allows for computations on GPUs with CUDA support. In addition, there are tools for working with the library in Python. The repository contains all the necessary documentation and tutorials.

    LISA Lab Deep Learning Tutorials
    Stars: 3,673, Forks: 2,045

    Textbook of the University of Montreal. The materials presented here introduce some of the most important deep learning algorithms, as well as demonstrate the principle of working with Theano. Theano is a Python library that simplifies the writing of deep learning models and makes it possible to train them on the GPU.

    This list is not limited to the number of interesting things on Gitkhab. Next time we’ll talk about machine learning projects and open datasets. If you have your own examples of interesting repositories, share them in the comments.

    Also popular now: