caption

FASTEST Way to Learn Data Science and ACTUALLY Get a Job

Scientists have long been 
interested in Machine Learning.   But Computers made very bad students because 
they started out from such a low level.   For many people, it takes a PhD to become 
a Data Scientist. That’s because Data   Science requires a deep understanding of 
Statistics, Programming, Machine Learning   and Business. Having a basic knowledge in all 
these different domains is achievable in a few   months. But becoming an employable Data Scientist 
is where the real challenge lies. You’ll need to   be very strategic about what you learn and how you 
learn it. Through my Master’s in Computer Science   and Applied Maths as well as by working closely 
with my Data Science colleagues at Microsoft,   I have found a path that will not only provide 
you with all the necessary skills, it will also   prepare you for the data science interviews 
at big tech companies.

In this video, I will   share all the steps of this path and provide you 
the free resources you will need at every step.   Along the way, I will also tell you 3 mistakes 
that stop people from becoming a Data Scientist. Let’s start with the first pillar of Data Science 
and that is: Statistics. Let’s say that Google   decides to change the color of the Search button 
to Green. You’re the Data Scientist incharge of   testing this change on a small portion of Google 
users.

Statistics will help you design this   experiment and it will also guide you on what to 
measure. And not only that, Statistics can help   you decide whether the data you collected in your 
experiment is reliable or just some random noise.   Statistics also sits at the core of machine 
learning algorithms like linear regression. So,   a good knowledge of Statistics is necessary 
to become a good Data Scientist. But to   learn Statistics, you need to know some basic 
concepts of Mathematics. Look, I know that many   of you don’t like Maths. And I wish I could say 
that Maths is not needed for Data Science. But,   if you’re looking to build a good career in Data 
Science, you need to know some basic things. To   make your life easy, I recommend doing this free 
4 week course on Coursera. This course is called   Data Science Math skills by Duke University. 
This course covers important concepts like Mean,   Variance, Derivatives and Bayes theorem. 
The best part about this course is that it’s   great for beginners.

For example, it covers 
even the most basic things like Venn diagram   and Sigma notation. Another great thing about 
this course is that it does not try to teach   you everything. It provides you just enough 
knowledge to get started with Statistics. Now that you feel confident about your maths 
skills, let’s learn Statistics. This is where many   people make their first big mistake. And that is 
they try to learn everything. Look, Statistics is   a very vast field and it requires many many years 
to fully understand it. For most Data Scientist   jobs, you just need to know some key concepts 
in Statistics. To put things in perspective,   here is the distribution of different Data 
Science roles in the market. In this diagram,   we see that the majority of Data Science roles 
are Analytics roles which means that they mainly   focus on defining business metrics and making data driven decisions through data visualization, among other things.   I will link this article in the description for 
you to review. A major insight from this article   is that Statistics heavy Data Science roles make 
up a small minority of 5% of total roles.

So,   we don’t need to go very deep into Statistics. 
In my case, I did multiple advanced level courses   in Statistics and later found out that I did 
not need most of it for Data Science. To learn   all the key concepts that you actually 
need, I recommend this course called   "Introduction to Statistics" by Stanford University. 
This course covers all the important ideas like   Probability, Normal distribution and Confidence 
Intervals and many more.

By the end of this course,   you would know all the Statistics you 
need to move on to Machine Learning. But before we can move on to Machine 
learning, we need to learn some Programming,   which is the second pillar of Data Science. 
When it comes to programming for Data Science,   we have primarily 2 languages to choose from. 
First one is R, which is purely designed for   Statistics and Data analysis. Second and more 
popular option is Python, which is a full-fledged   programming language that can be used for 
applications beyond Statistics and Machine   Learning. That’s why I would recommend picking 
Python as your programming language. But, How do   we learn Python? In our video on the “Fastest 
way to learn coding and actually get a job”,   we recommended learning Python by doing 
actual coding. For that, we gave you this   website called “learnpython.org”. On this 
website, complete the tutorials covering basics   as well as Data Science.

As always, play with 
the code and complete the exercise portion. Now that we have learnt Programming, let’s 
move onto the third pillar of Data Science   and that is Machine learning. This is where many 
people make their second biggest mistake. They   forget that knowing Machine learning algorithms 
would not help much if you don’t know how to get   the data to apply these algorithms to. When 
you are working on your personal projects for   Machine Learning, you can go to websites 
like UC Irvine's Machine Learning Repo   and choose data to work on. For example, In one 
of my personal projects for Computer Vision class,   I used UC Irvine’s Handwritten digits dataset. But 
in the real world, you rarely get well defined,   cleaned up data. You have to decide what 
data makes sense for your application   and then use SQL to extract that data. That’s 
why SQL questions are very common in Data Science   interviews. The mistake that people make is 
that they skip learning SQL. To learn SQL,   we will write some SQL queries. So, go 
to this tutorial on W3 schools and do   this hands-on tutorial.

Make sure to go through 
at least the SQL Tutorial portion at the top.   Also don’t forget the SQL examples portion at 
the button where you can test your knowledge. Before you can apply a machine learning algorithm,   you need to know what your data looks like. Some 
of the best presentations that I have attended   are the ones where Data Scientists slice and 
dice data to bring some deep insights   just through data visualization. Two very 
popular libraries for data visualization   in Python are Matplotlib and Seaborn. To learn 
these libraries, you can do this course called   “Data Visualization in Python” on Coursera. In 
this course, you’ll learn how to make box plots,   scatter plots and regression plots using the 
Matplotlib, Seaborn and some other libraries.   This course also covers dashboarding which is 
an essential part of most Data Science jobs.

Leave a Reply

Your email address will not be published. Required fields are marked *