Let Me Guide You To Your Biggest Adventure Yet in the Data Science World
--
Most people when they are referencing data science they are referring to machine learning. Unfortunately, that means usually that they forget about the other as important if not more important pillars: coding, visualizations, and statistics.
Today, I would like to briefly discuss these areas, but I will expand on these areas and provide examples over the next several articles.
Part 1: Coding
Coding
This one might seem a little bit obvious. You cannot start creating a data science project without learning how to code. However, I want to focus on what happens after the initial project.
Congratulations! You have completed your machine learning algorithm. Your project has answered the necessary questions, and you are ready to deploy your model.
Unfortunately, most data scientists and aspiring data scientists do not know what to do when it comes to deploying machine learning algorithms or other coding standards like unit tests, integration tests, pep8 standards for python, and something like dbt for sql standardization and testing. These standards will make the code more reliable and repeatable.
Visualizations
First, before we even think about what tools to use, we need to figure out what the purpose of these visualizations are. There are two specific areas where visualizations come into play.
The first area would be in exploratory analysis. First and foremost, the audience will be you. You are trying to figure out what the data looks like. How many rows, columns, nulls? What is the distribution and other statistics? Are we missing classes like age or gender groups? My strong suit is python, so most of these exploratory visualizations are on Jupyter notebook with matplotlib or seaborn, which are both python libraries. However, you can also work in Excel, Google Sheets, or even PowerBI. Whatever is the easiest to get an initial idea of the data.