A Data Scientist's blog: 04/01/2016

Tuesday, April 19, 2016

Top Machine Learning Algorithms for Data Scientists

Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.

Machine learning brings together computer science, mathematics and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.

Have you ever thought of what is the end-to-end process of investigating data through a machine learning lens? Have you ever extracted and identified useful features that best represent your big data? Have you ever gone to a complex process of dealing with big data predictions and evaluated the performance of your machine learning algorithms?

Leading companies such as Amazon, Google and Facebook uses efficient algorithms fit for big data, indexation, attribution modeling, collaborative filtering, and recommendation engines. Here is the top and the most important machine learning algorithms for data analysts and data scientists.

Algorithm 1: Gradient Descent

The Gradient Descent algorithm is as used as the optimization algorithm at the core of so many machine learning algorithms.

Linear Algorithms

Algorithm 2: Linear Regression

Algorithm 3: Logistic Regression

Algorithm 4: Linear Discriminant Analysis

Nonlinear Algorithms

Algorithm 5: Classification and Regression Trees

Algorithm 6: Naive Bayes

Algorithm 7: K-Nearest Neighbours

Algorithm 8: Learning Vector Quantization

Algorithm 9: Support Vector Machines

Ensemble Algorithms

Algorithm 10: Bagged Decision Trees and Random Forest

Algorithm 11: Boosting and AdaBoost

Drawbacks of Some Algorithms

Naive Bayes

Variables are almost never uncorrelated

Linear Discriminant Analysis

Clusters are almost never separated by hyperplanes

Linear Regression

Numerous model assumptions - including linearity - are almost always violated in real data.

Search This Blog