I have known Kaggle.com for almost two years now.

For those who don’t know what Kaggle is, it’s simply an online community for people who are passionate about data.

In Kaggle, you can learn through a free series of online courses and have access to an incredible amount of free data. Another important feature is the possibility of practicing what you are learning through Kaggle kernels (a cloud computational environment that supports Jupyter Notebooks and many programming languages).

The best known and, probably, the most intriguing aspect is the presence of competitions. The chance to challenge other kagglers in the search of the best predictions is truly a fascinating experience.

So, if you are an aspiring data scientist, this is your place. You won’t find a platform like this anywhere else!

Even though I explored several datasets published on Kaggle during the past years, in March 2019 I decided to participate in my first competition:

The Santander Customer Transaction Prediction.

It’s been a great and fun month. I have spent days exploring the dataset and the possible different approaches I could use. I trained several algorithms that I had never used before.

At the end, I can say without any doubt that this was one of the most formative experiences ever.

Here I would like to share with you the 5 most important things that I have learned from my first competition.

1. Take your time to explore the data and to raise hypotheses!

The temptation to look at what has already been done and shared is high and can lead you to the wrong path.

Trust me, it’s really tough to resist! :)

In my case, I started looking at the discussion forum and at the interesting kernels only after digging through the data for several days.

Even though I had many ideas after reading the forum, I truly think that if I had spent more time in exploratory data analysis, I would have gotten a greater benefit during the final phase of the competition! Next time I will be wiser and use this lesson.

2. The importance of a community

I already knew that the Kaggle community was super active and stimulating, but I didn’t expect to find that level of collaboration in a competition.

Many people are incredibly happy about sharing their knowledge, their code and even their guess on the data itself. Of course, it’s always a competition! So, you should follow some rules that may be not written such as not sharing your solution (if you are ranked well) during the last week.

Thanks to this competition, I learned that often it’s not important getting the first prices (well, I won’t be sad to rank in the top 10) but just facing and testing your limits is a huge victory!

I think many companies should consider to create or host competitions. The sense of community will payback all the investments with new skills and a lot of fun.

3. Don’t overcomplicate things

During the last period of the competition, one word became viral: MAGIC!

Harry’s uncle was right!

Everybody was looking for magic in the kernels, many people asked for tips in the forum and other people were trolling about having found magic solutions. Guess what? No magic was needed at all to perform above the average!

The key in this competition was to create new frequency features based on the original ones. After founding that and with a bit of parameter tuning, you would probably have ended in the medal zone!

I learned that sometimes thinking at complicated stuff is counter-productive. It leads you to waste time that you could have spent trying new approaches.

4. The importance of a team

Sometimes in the competition I ran out of ideas and whenever it happened I thought how cool would be to have teammates to share and chat about different solutions.

Splitting tasks is also a crucial part because training models and submitting predictions require some time, especially if you work during the day. So, having a team that equally contributes to the goal will let you spend less time sleeping on the pc!

In the next competition, for sure I will try to convince some friends to create a team. Alternatively, I could consider participating with people whom I don’t know. This will be fun for sure.

Networking is certainly an added value for Kaggle!

5. “Learning never exhausts the mind”

This is a Da Vinci’s sentence and I think it represents the essence of Kaggle.

Leonardo’s areas of expertise: invention, drawing, painting, sculpting, architecture, science, music, mathematics, engineering, literature, anatomy, geology, astronomy, botany, writing, history, and cartography.

Every kaggler learns, even the most expert one.

Take the challenges as an opportunity to learn and to practice data science. I guarantee you will be happy at the end of the day. In my case, I am proud to have gained this experience. It reminded me how many things I still have to learn and how much I am willing to do it.

Data science is a continuously evolving field with many subjects to study. Kaggle is the only place that can teach you both theoretically and practically what you need to do to become a true data scientist!


Here is the code I used (the predictive part is integrated with top scoring solutions):


Source: Artificial Intelligence on Medium