This blog was originally published at H2O.ai blog

Alt Text

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. The intention behind these interviews is to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.

In this article, I shall be sharing my interaction with Guanshuo Xu. He is a Kaggle Competitions Grandmaster and a Data Scientist at H2O.ai. Guanshuo obtained his Ph.D. in Electrical & Electronics Engineering at the New Jersey Institute of Technology, focusing on machine learning-based image forensics and steganalysis.

Guanshuo is a man of many accomplishments. His methods for real-world image tampering detection and localization won second place in the First IEEE Image Forensics Challenge. His architectural design of deep neural networks outperformed traditional feature-based methods for the first time in image steganalysis. More recently, Guanshuo also achieved the world rank #1 in the competition’s tier on Kaggle with a win in the Alaska2 Image Steganalysis and RSNA STR Pulmonary Embolism Detection competitions.

Here is also a link to Guanshuo’s interview at CTDS.show where he discusses his achievements on Kaggle.

In this interview, we shall know more about his academic background, passion for Kaggle, and his journey to the number one title. Here is an excerpt from my conversation with Guanshuo:


You have a background in Ph.D. in Electrical Engineering. Did it somehow influence your decision to take up Machine Learning as a career?

Guanshuo: Yes, my doctoral research used machine learning techniques to solve problems like image tampering detection and hidden data detection. For example, my last Ph.D. research project was to use deep neural nets on image steganalysis. So my education and research are directly related to machine learning. Hence, machine learning was a natural choice of career for me.


How did your tryst with Kaggle begin, and what kept you motivated throughout your grandmaster’s journey?

Alt text Guanshuo’s Kaggle Profile

Guanshuo: From the time I discovered kaggle, I have been addicted to it. Some of the motivating factors for continuous competing on Kaggle would be the combined satisfaction of winning competitions and prize money, learning new techniques, widening and deepening my understanding of machine learning, and building surprisingly effective models.


How does it feel to be World No 1 in Competitions? Does that bring in an extra amount of pressure while competing?

Alt Text

The top 5 Kagglers in the Competition’s category as on date | Source: Kaggle’s website

Guanshuo: Honestly speaking, there is a lot more pressure to maintain the number one rank than achieve it. This is because it requires “smoother” performance. Sometimes I have to participate in more competitions simultaneously than I used to participate in before.

How do you typically approach a Kaggle problem?

Alt Text A glimpse of Gunashuo’s wins on Kaggle

Guanshuo: My approach varies based on the type of problem and the goal of the competition. Nowadays, what I often do is spend days or even weeks on understanding the data and the problem and thinking of a solution which includes, for instance, guessing the distribution of the private test data, proper validation scheme, detailed modeling steps, etc. Once I have a decent picture of the overall approach, I start coding and modeling. This process helps me to gain more understanding and make corrections or adjustments, if necessary, to the overall approach.


Could you give us a sneak peek into your toolkit like a favorite programming language, IDE, Algorithms, etc

Guanshuo: As far as my toolkit is concerned, I mostly use gedit, Python, and Pytorch for deep learning.


The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?

Guanshuo: I get to know about most of the new stuff and technologies through Kaggle, my colleagues, or even by mere googling. As far as new developments in machine learning are concerned, it depends on the actual needs. I tend to filter out anything not instantly helpful and maybe keep an eye on the potentially exciting stuff. Then I get back to it as and when needed.


A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?

Alt Text

Guanshuo: It basically depends on each person’s background and interests. However, finding a suitable platform to learn and develop skills can make things much easier in general. Additionally, taking part in Kaggle competitions can prove to be an additional helpful resource.

To achieve a world no 1 rank is no mean feat, and Guanshuo’s relentless attitude and hard work deserve all the credit. A peek into his various winning solutions on Kaggle showcases his structured approach, which is an essential element to be inculcated for problem-solving.


Read other interviews in this series: