If you liked one of my posts and would like to check out more, or simply want to browse through my articles, this catalogue may be helpful for you. Here, you will find a list of all my posts in a condensed form. Hopefully, this will save you time finding…

When visualising data, often there is a need to plot multiple graphs in a single figure. For instance, multiple graphs are useful if you want to visualise the same variable but from different angles (e.g. side-by-side histogram and boxplot for a numerical variable). …

*Logistic regression* is a popular classification algorithm due to its simplicity and interpretability. If you are learning about or practicing data science, it’s likely that you have heard of this algorithm or even used it. …

*Linear regression* is probably the most well-known machine learning algorithm out there. It is often the first algorithm to encounter when studying or practicing data science because of its simplicity, speed, and interpretability. …

Reproducibility is an important characteristic of a good data science project. Many factors from setting random seeds, data versioning to using virtual environments can help improve the reproducibility of data science projects. In this post, we will look at the basics of managing Python virtual environments with Conda.

Did you know that we can prettify pandas DataFrames by accessing the `.style`

attribute? Here’s an example where we styled a DataFrame such that it resembles a heatmap:

Being able to skillfully and efficiently manipulate big data is a useful skill to have for data analysts, data scientists and anyone working with data. If you are already comfortable with Python and pandas, and want to learn to wrangle big data, a good way to start is to get…

Bayes’ Theorem provides a way to calculate updated probability of an event when new information becomes available. Simply put, it is a way of calculating conditional probability. In this post, we will look at the overview of Bayes’ Theorem then we will apply the theorem on a simple problem.

We…

A cross-tabulation is simple but effective way to inspect relationship between two or more *categorical *or *discrete *variables. In this post, we will look at three easy but useful ways to create cross-tabulations in pandas.

We will use *seaborn’s tips *dataset for this article.

`import pandas as pd # `*This…*

K-Nearest Neighbours (KNN here onwards) is an intuitive and easy to understand machine learning algorithm. This post provides a short introduction to KNN. We will first learn how the algorithm works conceptually with a simple example, then will implement the algorithm from scratch in Python to consolidate the conceptual knowledge.

…

Data Scientist 💡| Growth Mindset 🔑 | Math Lover 🔢 | Melbourne, AU 🐨 | https://zluvsand.github.io/