Overfitting or high variance in machine learning models occurs when the accuracy of your training dataset, the dataset used to “teach” the model, is greater than your testing accuracy. In terms of ‘loss’, overfitting reveals itself when your model has a low error in the training set and a higher error in the testing set. You can identify this visually by plotting your loss and accuracy metrics and seeing where the performance metrics converge for both datasets.
Classification problems are quite common in the machine learning world. As we know in the classification problem we try to predict the class label by studying the input data or predictor where the target or output variable is a categorical variable in nature.
If you have already dealt with classification problems, you must have faced instances where one of the target class labels’ numbers of observation is significantly lower than other class labels. This type of dataset is called an imbalanced class dataset which is very common in practical classification scenarios. …
In Pycon 2021, the creator of the Python language, Sir Guido Van Rossum elaborated on his long and short-term plans on how to make future versions of python faster and he envisions making it twice as fast as it is currently. PyPy and CPython are some of the existing examples which try to increase the execution speed of Python but you can do this yourself also if you just follow some tips and tricks on how to improve your coding skills so that you will be writing efficient code and not wasting any memory or CPU time.
Ensemble modeling is a powerful way to improve the performance of your model. It usually pays off to apply ensemble learning over and above various models you might be building. Time and again, people have used ensemble models in competitions like Kaggle and benefited from it.
Ensemble learning is a broad topic and is only confined by your own imagination. For the purpose of this article, I will cover the basic concepts and ideas of ensemble modeling. This should be enough for you to start building ensembles at your own end.
Let’s quickly start with an example to understand the…
Deep Learning is a very powerful tool that has now found usage in many primary fields of research and study. Neural networks and deep learning has been around for quite some time, but due to the lack of data, there was not much popularity for a long time. The creation of the ImageNet has been the primary driving force in the further development of Deep learning and Convolution Neural Networks.
Pneumonia is a kind of lung infection that can happen to either of the lungs. This infection can be caused by several reasons like bacteria, viruses, or fungi. …
SyriaTel Communications is a Telecommunications company that is looking to predict and prevent customer churn. Customer churn is when a customer leaves/discontinues their service with SyriaTel. Customer churn is a major problem for many service-based companies because it is so expensive. Not only does the company lose the customer’s monthly/yearly payment, but they also incur a customer acquisition cost to replace that customer.
To help SyriaTel fix the problem of customer churn, I first conducted an Exploratory Data Analysis (EDA) and then built a machine learning classifier that will predict the customers that are going to churn. …
Linear Regression is a commonly used supervised Machine Learning algorithm that predicts continuous values. Linear Regression assumes that there is a linear relationship present between dependent and independent variables. In simple words, it finds the best fitting line/plane that describes two or more variables.
On the other hand, Logistic Regression is another supervised Machine Learning algorithm that helps fundamentally in binary classification (separating discreet values).
Although the usage of Linear Regression and Logistic Regression algorithm is completely different, mathematically we can observe that with an additional step we can convert Linear Regression into Logistic Regression.
I am going to discuss…
“Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone.’’ — John Tukey
Exploratory data analysis is one of the best practices used in data science today. While starting a career in Data Science, people generally don’t know the difference between Data analysis and exploratory data analysis. There is not a very big difference between the two, but both have different purposes.
EDA involves looking at and describing the data set from different angles and then summarizing it.
Data Analysis is the statistics and probability to figure out trends in the data…
Data Science Student