Posts

Showing posts from September, 2020

Demystifying ‘Confusion Matrix’ Confusion

Image
  If you are Confused about Confusion Matrix, then I hope this post may help you understand it! Happy Reading. We will use the UCI Bank Note Authentication Dataset for demystifying the confusion behind Confusion Matrix. We will predict and evaluate our model, and along the way develop our conceptual understanding. Also will be providing the links to further reading wherever required. Understanding the Data The Dataset contains properties of the wavelet transformed image of 400x400 pixels of a BankNote, and can be found  here . It is recommended for reader to download the dataset and follow along. Further for reference, you can find the Kaggle Notebook  here . #Skipping the necessary Libraries import #Reading the Data File df = pd.read_csv('../input/BankNote_Authentication.csv') df.head(5) Sample Data (using Head) #To check if the data is equally balanced between the target classes df['class'].value_counts() Target Class is balanced enough Building the Model Splitting th

Accuracy, Precision, Recall & F1 Score

Image
 This post demonstrates how to evaluate the performance of a model via Accuracy, Precision, Recall & F1 Score metrics in ML and provides a brief explanation of the “Confusion Metrics”. In this experiment, I have used Two-class Boosted Decision Tree Algorithm and my goal is to predict the survival of the passengers on the Titanic. Once you have built your model, the most important question that arises is how good is your model? So, evaluating your model is the most important task in the data science project which delineates how good your predictions are. Fig. Evaluation results for classification model Let’s dig deep into all the parameters shown in the figure above. The first thing you will see here is ROC curve and we can determine whether our ROC curve is good or not by looking at AUC (Area Under the Curve) and other parameters which are also called as Confusion Metrics. A confusion matrix is a table that is often used to describe the performance of a classification model on a se

Guide to Types of Sampling Techniques

Image
  Overview Sampling is a popular statistical concept – learn how it works in this article We will also talk about eight different types of sampling techniques using plenty of examples   Introduction Here’s a scenario I’m sure you are familiar with. You download a relatively big dataset and are excited to get started with analyzing it and building your machine learning model. And snap – your machine gives an “out of memory” error while trying to load the dataset. It’s happened to the best of us. It’s one of the biggest hurdles we face in data science – dealing with massive amounts of data on computationally limited machines (not all of us have Google’s resource power!). So how can we overcome this perennial problem? Is there a way to pick a subset of the data and analyze that – and that can be a good representation of the entire dataset? Yes! And that method is called sampling. I’m sure you’ve come across this term a lot during your school/university days, and perhaps even in your profe