Blog: DS/AI-SSH: Utilizing the Resources
In the last post we learnt how to work on the building blocks in order to work on Data Science.
This is the 5th post of blog post series ‘DS/AI: Self-Starter Handbook’, this post outlines the resources (mainly free except books) that DS/AI starters should refer. Following topics are covered here:
- Books to refer
- Courses to attend
- Blogs to follow
- Podcasts to listen
- YouTube channels
- Useful GitHub Repos
After getting to know DS/AI landscape and its building blocks in previous posts, you may ask which are the resources to refer, how do I know if a book or course is worth to spend time and/or money?
This post is my attempt to make your task easier. I am listing down major quality resources (mostly free) here and also going to provide you my critical review of these resources, which will help you to take informed decision.
Books to refer
Machine Learning with R
This is an excellent book for the R starter who wants to apply ML to any kind of project. All the main ML models are presented, as well as different performance metrics, bagging, pruning, tuning, ensembling etc. Easy to scan through, many tips with fully-solved textbook problems. Certainly a very good starting point if you plan to compete on Kaggle. If you already master both R and ML, this books is obviously not for you.
In DetailMachine learning, at its core, is concerned with transforming data into actionable knowledge. This fact makes…www.amazon.com
Python Machine Learning
This is a fantastic introductory book in machine learning with python. It provides enough background about the theory of each (covered) technique followed by its python code. One nice thing about the the book is that it starts implementing Neural Networks from the scratch, providing the reader the chance of truly understanding the key underlying techniques such as back-propagation. Even further, the book presents an efficient (and professional) way of coding in python, key to data science.
Unlock deeper insights into Machine Leaning with this vital guide to cutting-edge predictive analyticsAbout This…www.amazon.com
The book explains concepts of Statistical Learning from the very beginning. The core ideas such as bias-variance trade-off are deeply discussed and revisited in many problems. The included R examples are particularly helpful for beginners to learn R. The book also provides a brief, but concise description of functions’ parameters for many related R packages. Compared to The Elements of Statistical Learning, it is easy for the reader to understand. It does a wonderful job of breaking things down complex concepts. If one wishes to learn more about a particular topic, I’d recommend The Element of Statistical Learning. These two pair nicely together.
“As a former data scientist, there is no question I get asked more than, “What is the best way to learn statistics?” I…www-bcf.usc.edu
This is the book to read on deep learning. Written by luminaries in the field — if you’ve read any papers on deep learning, you must have heard about Goodfellow and Bengio before — and cutting through much of the BS surrounding the topic: like ‘big data’ before it, ‘deep learning’ is not something new and is not deserving of a special name. Networks with more hidden layers to detect higher-order features, networks of different types chained together in order to play to their strengths, graphs of networks to represent a probabilistic model.
This is a theoretical book, but it can be read in tandem with Hands-On Machine Learning with Scikit-Learn and TensorFlow, almost chapter-for-chapter. The Scikit-Learn and Tensorflow example code, while only moderately interesting on its own, helps to clarify the purpose of many of the topics in the Goodfellow book.
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine…www.deeplearningbook.org
Hands-On Machine Learning with Scikit-Learn and TensorFlow
This book provides great introduction into machine learning for both developer and non developers. Authors suggest to just go through even if you don’t understand math details. Highlights of this book are:
- Extraction of field expert knowledge is very important, you should know which model will serve better for the given solution. Luckily, lot of models are available already from other scientists.
- Training data is the most important part, the more you have it the better. So if you can you should accumulate as much data as you can, preferably categorized, you may not still know how you will apply the accumulated data in the future but you will need it.
- Labeling training data is very important too, to train neural network you need to have at least thousands of labeled data samples, the more the better.
- Machine learning algorithms and neural networks are pretty common for years but latest breakthrough is possible because of new optimization , new autoencoders ( that may help to artificially generate training data) allowing to do training faster and with less data.
- Machine learning is still pretty time and resources consuming process. To train machine learning model you need to know how to tweak parameters and how to use different training approaches fitting the particular model.
The book demonstrate (including the code) different approaches using Scikit-Learn python package and also the TensorFlow.
Graphics in this book are printed in black and white.Through a series of recent breakthroughs, deep learning has…www.amazon.com
Data Science for Business
This is probably the most practical book to read if you are looking for an overview of data science. Either you know when terms like k-means and ROC curves are to be used or you have some context when you start digging deeper into how some of these algorithms are implemented. You will find it at right level because there is just enough math to explain the fundamental concepts and make them stick in your head.
This isn’t a book on implementing these concepts or a bunch of algorithms. This gives the book the advantage of being something you can refer to an intelligent manager or interested developer, and they can both get a lot out of it. And if they are interested in the next level of learning there are plenty of pointers. You will also find the chapter on presenting results through ROC curves, lift curves, etc. pretty interesting. It would be cool if this book had some more hands on, but you can go to Kaggle and browse around the current and past competitions to apply what you learn here .
Courses to attend
Machine Learning is one of the first programming MOOCs. Coursera put online by Coursera founder and Stanford Professor Andrew Ng. This course assumes that you have basic programming skills and you have some understanding of Linear Algebra. Knowledge of Statistics & Probability is not required though.
Andrew Ng does a good job explaining dense material and slides. The course gives you a lot of structure and direction for each homework, so it is generally pretty clear what you are supposed to do and how you are supposed to do it.
When you are rather new to the topic, you can learn a lot of doing the deeplearning.ai specialization. First and foremost, you learn the basic concepts of NN. How does a forward pass in simple sequential models look like, what’s a backpropagation, and so on. I experienced this set of courses as a very time-effective way to learn the basics and worth more than all the tutorials, blog posts and talks, which I went through beforehand.
Doing this specialization is probably more than the first step into DL. I would say, each course is a single step in the right direction, so you end up with five steps in total. I think it builds a fundamental understanding of the field. But going further, you have to practice a lot and eventually it might be useful also to read more about the methodological background of DL variants. But doing the course work gets you started in a structured manner — which is worth a lot, especially in a field with so much buzz around it.
Learn Deep Learning from deeplearning.ai. If you want to break into AI, this Specialization will help you do so. Deep…www.coursera.org
If your goal is to be able to learn about deep learning and apply what you’ve learned, the fast.ai course is a better bet. If you have the time, interleaving the deeplearning.ai and fast.ai courses is ideal — you get the practical experience, applicability, and audience interaction of fast.ai, along with the organised material and theoretical explanations of deeplearning.ai.
We’ve seen the litany of moral failure from tech company executives: paying off tens of millions of dollars to…www.fast.ai
Practical data skills you can apply immediately: that’s what you’ll learn in these free micro-courses.They’re the fastest (and most fun) way to become a data scientist or improve your current skills.
Blogs to follow
KDnuggets is a leading site on AI, Analytics, Big Data, Data Mining, Data Science, and Machine Learning and is edited by Gregory Piatetsky-Shapiro and Matthew Mayo. KDnuggets was founded in February of 1997. Before that, Gregory maintained an earlier version of this site, called Knowledge Discovery Mine, at GTE Labs (1994 to 1997).
Analytics Vidhya provides a community based knowledge portal for Analytics and Data Science professionals. The aim of the platform is to become a complete portal serving all knowledge and career needs of Data Science Professionals.
Towards Data Science
TDS joined Medium’s vibrant community in October 2016. In the beginning, their goal was simply to gather good posts and distribute them to a broader audience. Just a few months later, they were pleased to see that they had a very fast growing audience and many new contributors.
Today they are working with more than 10 Editorial Associates to prepare the most exciting content for our audience. They provide customized feedback to our contributors using Medium’s private notes. This allows them to promote their latest articles across social media without the added complexity that they might encounter using another platform.
Podcasts to listen
This is Analytics Vidhya’s exclusive podcast series which will feature top leaders and practitioners in the data science and machine learning industry.
So in every episode of DataHack Radio, they bring you discussions with one such thought leader in the industry. They have discussions about their journey, their learnings and plenty of other data science related things.
Super Data Science
Kirill Eremenko is a Data Science coach and lifestyle entrepreneur. The goal of the Super Data Science podcast is to bring you the most inspiring Data Scientists and Analysts from around the World to help you build your successful career in Data Science.
Data is growing exponentially and so are salaries of those who work in analytics. This podcast can help you learn how to skyrocket your analytics career. Big Data, visualization, predictive modeling, forecasting, analysis, business processes, statistics, R, Python, SQL programming, tableau, machine learning, hadoop, databases, data science MBAs, and all the analytcis tools and skills that will help you better understand how to crush it in Data Science.
Your host here is Siraj. He is on a warpath to inspire and educate developers to build Artificial Intelligence. Games, music, chatbots, art, he teaches you how to make it all yourself. This is the fastest growing AI community in the world. Their mission: Solve AI. Use it to benefit humanity.
He is an AI Researcher, his latest paper is here — https://drive.google.com/file/d/0BwUv84lNDk72Q1gzaXgwR2U3U2NWVlZSOFk4amZIRmV1QXI0/view
He is also a Data Scientist, AI Educator, Rapper, Author, and Director of the School of AI (www.theschool.ai)
I’m Siraj. I’m on a warpath to inspire and educate developers to build Artificial Intelligence. Games, music, chatbots…www.youtube.com
Are you trying to learn data science so that you can get your first data science job? You’re probably confused about what you’re “supposed” to learn, and then you have the hardest time actually finding lessons you can understand! Data School focuses you on the topics you need to master first, and offers in-depth tutorials that you can understand regardless of your educational background.
Your host here is Kevin Markham, and he is the founder of Data School. He has taught data science using the Python programming language to hundreds of students in the classroom, and hundreds of thousands of students (like you) online. Finding the right teacher was so important to his data science education, and so he sincerely hopes that he can be the right data science teacher for you.
Caltech Machine Learning
This is an introductory course by Caltech Professor Yaser Abu-Mostafa on machine learning that covers the basic theory, algorithms, and applications. Machine learning (ML) enables computational systems to adaptively improve their performance with experience accumulated from the observed data. ML techniques are widely applied in engineering, science, finance, and commerce to build systems for which we do not have full mathematical specification (and that covers a lot of systems). The course balances theory and practice, and covers the mathematical as well as the heuristic aspects.
Awesome Data Science
This Repo answer the questions, “What is Data Science and what should you study to learn Data Science?” An awesome Data Science repository to learn and apply for real world problems.
As the aggregator says, “Our favorite data scientist is Clare Corthell. She is an expert in data-related systems and a hacker, and has been working on a company as a data scientist. Clare’s blog. This website helps you to understand the exact way to study as a professional data scientist.”
“Secondly, Our favorite programming language is Python nowadays for Data Science. Python’s — Pandas library has full functionality for collecting and analyzing data. We use Anaconda to play with data and to create applications.”
memo: An awesome Data Science repository to learn and apply for real world problems. – bulutyazilim/awesome-datasciencegithub.com
Essential Cheat Sheets for Machine Learning and Deep Learning Engineers
Machine learning is complex. For newbies, starting to learn machine learning can be painful if they don’t have right resources to learn from. Most of the machine learning libraries are difficult to understand and learning curve can be a bit frustrating. Kailash Ahirwar has created a repository on Github (cheatsheets-ai) containing cheatsheets for different machine learning frameworks, gathered from different sources. Have a look at the Github repository, also, contribute cheat sheets if you have any. Thanks.
Essential Cheat Sheets for deep learning and machine learning researchers – kailashahirwar/cheatsheets-aigithub.com
HackerMath for Machine Learning
Math literacy, including proficiency in Linear Algebra and Statistics, is a must for anyone pursuing a career in data science. The goal of this workshop is to introduce some key concepts from these domains that get used repeatedly in data science applications.
As outlined by Amit Kapoor, “Our approach is what we call the ‘Hacker’s way’. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught.”
The focus here is on depth rather than breadth. Three areas are chosen — Hypothesis Testing, Supervised Learning and Unsupervised Learning. They are covered to sufficient depth — 50% of the time on the concepts and 50% of the time spent coding them.
In the next post we will cover how you can build you portfolio in data science to highlight your knowledge and skills.
Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also feel free to listen to me on SoundCloud.