Image for post
Image for post
Photo by Jude Beck on Unsplash

Decision Tree is one of the most popular and powerful classification algorithms in machine learning , that is mostly used for predicting categorical data. Entropy/Information Gain and Gini Impurity are 2 key metrics used in determining the relevance of decision making when constructing a decision tree model.

To know more about these you may want to review my other blogs on Decision Trees, Entropy/Information Gain and Gini Impurity.

In this blog, let’s build a decision tree classifier model using both Gini and Entropy to detect Heart Disease. I have used Kaggle notebook to code and used the UC Irvine Heart Disease dataset from Kaggle to find out the most important factor that impacts heart disease in a patient. …


In continuation to my earlier blog “How to extract question and answer pairs from telegram chat using Python pandas?” , I am here to explain the sentiment analysis on the same telegram group chat history.

Image for post
Image for post

The assignment is to find the satisfied and unsatisfied members in the “Eradicate Diabetes” telegram group and design a decision tree classifier model using the data.

A short introduction about “Eradicate Diabetes(ED)” — ED is a community chat group which unites the masses together to combat the problem using the power of crowdsourced healthcare. They help people in reversing their Type 2 diabetes by providing information and support. They not only help in reversing Type 2 diabetes, but also believe in holistic treatment of the organs like the liver, kidneys & heart that have been damaged over years of abuse. …


How Machine Learning helps security in real-world

Image for post
Image for post
Image Source

Machine Learning is a core building block in the field of Data Science and Artificial Intelligence. As we all know, mathematics and statistics are the backbones of machine learning algorithms, and the algorithms that are used to discover correlations, anomalies, and patterns deal with data that are too complex.

When we talk about Security, spam is the first thing that comes to our mind. …


Understanding Tensors, the central data type of TensorFlow

Image for post
Image for post
Photo by Faris Mohammed on Unsplash

Let me begin this blog with a little bit of introduction about TensorFlow, which is now a very popular framework in the world of Deep Learning.

TensorFlow, Google’s gift to the world of data science and machine learning. TensorFlow 1.0 was released in 2015 by the Google Brain team and the current version is 2.0 which was released in 2019.

The primary interface of TensorFlow is Python, but the core functionality is written in C++ for better performance. Tensorflow stores operations in a graph, like other frameworks, and it can be deployed to a GPU or a network. TensorBoard is the utility provided by TensorFlow using which these graphs and their operations are visualized. …


ML Regression project: House price prediction using Azure ML Studio

Image for post
Image for post
Photo by Marcus Lenk on Unsplash

Azure Machine Learning Studio is a Web-based integrated development environment(IDE) for building and operationalizing Machine Learning models/workflow on Azure.

I wanted to explore Azure ML Studio to build machine learning models, which I had already built using Scikit learn library in Python. The Python project code can be found here on Github.

As a beginner, wanted to build simple regression models using the California housing prices dataset from Kaggle and evaluate the outcomes. In this blog, I will be using Azure ML Studio(free version) to build ML models and evaluate them.

Step 1: Let’s look at the dataset(Explore the dataset)

The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The data is almost clean with minimal cleaning of missing data and removal of unwanted details. …


Basic steps to get started with Azure ML Studio(classic)

Image for post
Image for post

To develop a machine learning prediction model, we collect/gather data from one or more sources, explore, transform, and analyze that data using various data manipulation and statistical functions, and generate the results. We then modify various features/parameters and their functions, until we are satisfied that we have built and trained an effective model.

Microsoft has given us an interactive tool called Azure ML Studio classic. As per Microsoft, Machine Learning Studio (classic) is a drag-and-drop tool that one can use to build, test, and deploy machine learning models. …


How can we validate a machine learning model using cross-validation?

Image for post
Image for post
Photo by Solé Bicycles on Unsplash

When we build a machine learning model, it is really important to find a way to validate that the model we built and the hyperparameters used are a good fit for the data. Model validation may sound very simple: after selecting the model and choosing the hyperparameters to build the model, we can estimate how effective the model is by applying it to some test data and comparing the predicted values with the known values. But there are some pitfalls which must be avoided.

Before exploring cross-validation to validate a model, let’s try to use a naive approach to model validation and see how it fails. …


Let’s find out using Python libraries Numpy, Pandas, and MatPlotLib

Image for post
Image for post
Photo by Brandon Mowinkel on Unsplash

As I contemplated the next topic for my blog, this struck me as an interesting and curious topic that would also be a useful Python exercise. Let me tell you why I chose this title. While I was searching for datasets to work on, I happened to stumble upon an interesting one that featured heights of all the US presidents. I was instantly curious to explore the data and find out how many US presidents were taller than 6 feet. I hope the same curiosity will invite many to read my blog as well.

Let me start with a small introduction to the Python libraries in question: Numpy, Pandas, MatPlotLib, and Seaborn which will be used in finding the answer to my question. …


Image for post
Image for post

Data science impacts our modern lives in far more ways than we may think. When we use Google search or Bing, we are using a sophisticated application of data science. The suggestions we see for other searches that come up when we are typing is all due to data science. Data has become an integral part of every business and is inevitable in everyday life. Even doctors rely on data science interpretations more and more these days.

Big Data is the term used to refer to large and complex datasets that are too large for traditional data processing software like SPSS, Spreadsheets, etc. …


Image for post
Image for post
Photo by Jens Lelie on Unsplash

Decision Tree is one of the most popular and powerful classification algorithms that we use in machine learning. As the name itself signifies, decision trees are used for making decisions from a given dataset. The concept behind the decision tree is that it helps to select appropriate features for splitting the tree into subparts similar to how a human mind thinks.

To build the decision tree in an efficient way we use the concept of Entropy/Information Gain and Gini Impurity. To know more about Entropy/Information Gain, you may want to read my Entropy blog.

In this blog, let’s see what is Gini Impurity and how it is used to construct decision trees. …

About

Bhuvaneswari Gopalan

Data Scientist and Machine Learning Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store