Artificial Intelligence

Posts

Showing posts from September, 2020

Demystifying ‘Confusion Matrix’ Confusion

September 26, 2020

If you are Confused about Confusion Matrix, then I hope this post may help you understand it! Happy Reading. We will use the UCI Bank Note Authentication Dataset for demystifying the confusion behind Confusion Matrix. We will predict and evaluate our model, and along the way develop our conceptual understanding. Also will be providing the links to further reading wherever required. Understanding the Data The Dataset contains properties of the wavelet transformed image of 400x400 pixels of a BankNote, and can be found here . It is recommended for reader to download the dataset and follow along. Further for reference, you can find the Kaggle Notebook here . #Skipping the necessary Libraries import #Reading the Data File df = pd.read_csv('../input/BankNote_Authentication.csv') df.head(5) Sample Data (using Head) #To check if the data is equally balanced between the target classes df['class'].value_counts() Target Class is balanced enough Building the Model Splitting th...

Accuracy, Precision, Recall & F1 Score

September 26, 2020

This post demonstrates how to evaluate the performance of a model via Accuracy, Precision, Recall & F1 Score metrics in ML and provides a brief explanation of the “Confusion Metrics”. In this experiment, I have used Two-class Boosted Decision Tree Algorithm and my goal is to predict the survival of the passengers on the Titanic. Once you have built your model, the most important question that arises is how good is your model? So, evaluating your model is the most important task in the data science project which delineates how good your predictions are. Fig. Evaluation results for classification model Let’s dig deep into all the parameters shown in the figure above. The first thing you will see here is ROC curve and we can determine whether our ROC curve is good or not by looking at AUC (Area Under the Curve) and other parameters which are also called as Confusion Metrics. A confusion matrix is a table that is often used to describe the performance of a classification model on ...

Guide to Types of Sampling Techniques

September 24, 2020

Overview Sampling is a popular statistical concept – learn how it works in this article We will also talk about eight different types of sampling techniques using plenty of examples Introduction Here’s a scenario I’m sure you are familiar with. You download a relatively big dataset and are excited to get started with analyzing it and building your machine learning model. And snap – your machine gives an “out of memory” error while trying to load the dataset. It’s happened to the best of us. It’s one of the biggest hurdles we face in data science – dealing with massive amounts of data on computationally limited machines (not all of us have Google’s resource power!). So how can we overcome this perennial problem? Is there a way to pick a subset of the data and analyze that – and that can be a good representation of the entire dataset? Yes! And that method is called sampling. I’m sure you’ve come across this term a lot during your school/university days, and perhaps even in y...