An important part of Data Science is Machine Learning: it finds patterns in data or forecasts problems. Usually, when Data Science and Artificial Intelligence are discussed, people are referring to Machine Learning.
In this article, I have collected books, courses and resources that will help beginners and experts to learn machine learning from scratch.
Free E-Books for Machine Learning
Classic Machine learning ebooks used in university as a text book
Beginners will find these two books easier to understand. Students or programmers who are making their first foray into machine learning will benefit greatly from this book. Mathematical concepts are often difficult to understand. Textbooks like these are commonly used in universities.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction – Trevor Hastie, Robert Tibshirani, Jerome Friedman – Download free
- Pattern Recognition and Machine Learning – Cristopher Bishop – Download free
Free Machine learning ebooks to download from MIT Press
MIT (Massachusetts Institute of Technology) Press is the publisher of these two books. The first book includes mathematical concepts and probability theory basics. A second book is about deep learning, a branch of machine learning that deals with deep neural networks.
- Machine Learning: a Probabilistic Perspective – Kevin Murphy MIT Press – Download free
- Deep Learning – Ian Goodfellow, Yoshua Bengio, Aaron Courville MIT Press – Dowload free
Free Machine learning ebook for beginners with samples and examples
Both of these books are aimed at absolute beginners. Machine learning is demonstrated through examples and code. This is a must-read for anyone who is completely new to machine learning.
The books will introduce you to the basics of machine learning and show example code examples, but they won’t teach you how to be an expert. The book that will help programmers excel in machine learning.
- Machine Learning in Action Peter harrington – Download Free
- Building Machine Learning Systems with Python Willi Richert, Luis Pedro Coelho – Download Free
Free ebooks for Natural language processing for experts
These are excellent books for learning how languages work. There are good introductions to classic speech recognition, word processing, and information retrieval techniques in these books. These books are essential if you are studying language processing.
- Speech and Language Processing Dan Jurafsky, James H. Martin – Download free
- Foundations of Statistical Natural Language Processing Chris Manning, Hinrich Schütze – Download free
- Introduction to Information Retrieval Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze – Download free
Python Libraries for Machine Learning
TensorFlow is a high-performance dataflow graph computation library developed by Google.
Deep learning and neural networks are typically the application areas of TensorFlow. This makes it a popular library.
- The algorithm supports reinforcement learning.
- A computational graph abstraction is provided.
- Big community support.
- TensorBoard lets you render models directly in your browser.
- Ready for work.
- Adaptable to multiple CPUs and GPUs.
- Slower than other frameworks in the same area.
- Steep learning curve compared to PyTorch.
- Graph computations can be slow.
- Commercial support is not available.
Kyeras is a Python library that provides high-level APIs for neural networks that run on top of TensorFlow, CNTK, or Theano.
- The best way to experiment and prototype quickly.
- Visualizes neural networks easily.
- Modelling and visualizing are convenient.
- Resource consuming and slow.
Pytorch is a popular library made in C and wrapped in Lua. It was created by Facebook and is used by Twitter, Salesforce, and many other organizations.
- This library includes tools for computer vision, natural speech processing, deep learning, etc.
- Tensor calculations can be performed using GPU acceleration.
- Aids in the creation of computational charts.
- Modelling is simple and transparent.
- Defining-by-run is more like classic programming.
- It uses old-style debugging tools like pdb, ipdb or the PyCharm debugger.
- Models and modules can be combined together easily.
- PyTorch is relatively new, so there aren’t many online resources.
- A high learning curve makes it difficult for new programmers.
- A little too early for full-scale work compared to TensorFlow.
Matplotlib, NumPy, and SciPy are all used to build Scikit-Learn library. A variety of tools are provided for data mining and analysis.
- Efficacious and simple.
- Continually improving and updating.
- Various algorithms are available, including factor and cluster analysis.
- Easily extracts data from text and images.
- Useful for NLP.
- It does not work well with unsupervised learning (like deep learning).
Searborn is a Python library for generating statistical graphs. The library is built on matplotlib and integrates pandas data structures.
- Provides more attractive plots than matplotlib.
- Unlike matplotlib, it offers inline plots.
- Reduces rendering code.
- Integrations well with Pandas
- Built on matplotlib, so little bit of learning curve.
- Matplotlib relies on default themes, so its output is less customizable.
Natural Language Toolkit (NLTK)
The NLTK is an open-source framework and library for creating systems for manipulating natural language (NLP). Python-based NLP tool.
- Sample data is included along with graphical tools.
- Assists in various ML operations, such as classifying, parsing, tokenizing, etc.
- Creates prototypes and research systems.
- Multiple languages are supported.
- Working with NLTK requires an understanding of strings.
- Tokenization affects performance.
NumPy allows for high-level math functions and multidimensional array processing in Python, used for scientific computing.
- Easy to use and interactive.
- A powerful program that integrates C/C++ and Fortran with Fourier transforms and complex number generation functions.
- Inputs for scikit-learn and TensorFlow are NumPy arrays
- Greate community support
- Facilitates complex implementations.
- Overkill if you have enough regular Python lists already.
Matplotlib is a cross-platform library for creating 2D charts. You can create high-quality graphs and charts in several formats.
- It supports Jupyter Notebooks, web application servers, and many front-end tools (GTK +, Tkinter, Qt, and wxPython).
- MATLAB-style interface for drawing diagrams.
- You can customize axes, fonts, line styles, and more using the object-oriented interface.
- It works with a wide variety of platforms and graphics engines.
- Often used with other libraries such as Pandas.
- For a novice developer, two different interfaces can be confusing.
- Mathplotlib is mainly used for visualizing data, not analyzing it
Essentially, Pandas provides Python data structures and data analysis tools that are high-performance and yet easily understandable.
- A fast, flexible and expressive data structure.
- It supports aggregation, concatenation, iteration, reindexation, and rendering operations.
- Several Python libraries are compatible with it.
- Simple data management with minimal instructions.
- Offers commercial and academic support for a variety of fields.
- Based on matplotlib, a beginner should know both in order to know how to approach a specific problem.
- This method is not suitable for n-dimensional arrays or statistical modelling.
SciPy is an extremely popular library with modules for linear algebra, optimization, integration, statistics, and more.
- An ideal tool for manipulating images.
- Math operations are easy to handle.
- Supports integration and optimization mathematical operations.
- Supports signal processing.
- It is not possible to see the stack or library in SciPy which confuses the developer.