Final student projects - Data Science Bootcamp #12

by Badru Stanicki

One of the biggest highlights of our 12-week Data Science Immersive Program is of course, the capstone project phase! During this time, our brilliant students have a unique opportunity to solve practical Data Science problems that have been provided by companies and research institutes throughout Switzerland. This is a very important part of our curriculum because at Constructor Academy, we care deeply about teaching students the specific skills that are currently demanded in the industry and are essential for solving meaningful problems. Over the last 3.5 weeks, our students experienced the complete Data Science process, from defining their client’s business problem, exploring the data to apply suitable machine learning techniques, to finally delivering a functional prototype to the company. The culmination of all the hard work that goes into these capstone projects is a public presentation, in front of family members, friends, companies, students and everyone who wants to attend. You can simply register for the next presentations via Meetup.

To give you an idea of what these projects can look like, we would like to share a short description in this post about the projects that were completed by our most recent students (Batch 12: August 31 - November 20, 2020).

Aison Technologies - ML based segmentation of soft tissues from ultrasound images

Students: Leticia Fernandez Moguel, Nicolas Bernath, Roman Grisch

Final project Aison Technologies

Ultrasound (US) has become an increasingly popular tool for visualizing soft tissue in all areas of medicine. However, the task of identifying tissues such as fat, muscle, tendons, ligaments in the ultrasound image require a trained specialist, and are usually to interpret. In this project, Roman Grisch, Nicolas Bernath, and Leticia Fernandez Moguel have used a deep learning model with a UNET architecture to develop a tool that identifies the type of tissue from ultrasound images. The intended purpose of these model predictions will be to help healthcare providers identify body parts in the US, while recognizing that the medical diagnosis continues to remain in the hands of the doctors.

Thomson Reuters - Fact extraction from legal documents

Student: Namrata Gurung

Final project Thomson Reuters

The aim of the project was to assist Thomson Reuters Labs’ ongoing project that uses NLP deep learning models to automatically extract relevant metadata fields (such as court name, filing date, plaintiff and defendant names etc.) from legal documents. For this, Namrata Gurung performed several NLP deep learning experiments. The first was to experiment whether several existing pointer-generator models could be combined into one, and the second was to determine whether training on transformer models leads to any improvement. The second experiment is still ongoing. However, the preliminary results from the first experiment look promising and if the new model surpasses the previous model architecture, there are possibilities of its deployment at the Thomson Reuters Labs’ project.

ETH Library Lab - Automated labeling of plant specimens

Students: Lindsey Parkinson, Matteo Jucker Riva

Final project ETH Library Lab

Swiss botanists have collected and dried plants with flowers and fruits intact for over two hundred years. Now, many of those plants are an excellent scientific resource stored in the combined herbaria of ETH and the University of Zürich. Reviewing every one of the tens of thousands of specimens by hand is impractical. Lindsey and Matteo created a model to detect 1) if a plant specimen in the family Brassicacea had flowers or fruits and 2) count the number of flowers or fruits. With this information and the date the plant was collected combined, modern botanists can measure how flowering times have changed since the early 1800s.

CrystalsFirst - Protein crystal detection

Students: Claudio Cunha, Linda Wymann

Final project CrystalsFirst

CrystalsFirst GmbH is a biopharmaceutical start-up that combines machine learning with their proprietary lab technology to design new drugs. New medicine is created by finding new protein structures which in turn can be achieved by crystallizing these proteins. Currently, a crystallographer at CrystalsFirst GmbH manually screens thousands of well-images to find exactly these crystals. They occur rarely and the work is tedious and time-consuming. The goal of Linda Wymann and Claudio Cunhia was to train a deep neural network to learn to differentiate between crystal and non-crystal images and thereby decrease the manual labor at the company. The final model was able to predict 9 out of 10 images correctly as crystals and always correctly interpreted the non-crystal images on the test set. The model will be implemented into the Google Cloud AI Platform and improve the work efficiency at the company. CrystalsFirst GmbH will be able to further improve and train the model by using the code provided to them.

Yamo - Make babies and parents happy

Students: Gianluca Badjan, Matthias Weber

Final project Yamo

Predicting customer churn is key for improving customer retention and revenues while keeping the overall customer acquisition costs down. Yamo is a startup providing healthy organic food for babies and infants delivered at home, through subscriptions (food-as-a-service) or single purchases. Matthias Weber and Gianluca Badjan had the challenge to predict customer churn by applying Machine Learning on sales data while returning insights on what drives churn. The produced models need further development, especially on integrating new data sources and alternative approaches, in order to enable Yamo to take action customer by customer.

University of Freiburg - I-dental-fication: How to ask a body how old it is

Students: Thomas Oliver, Tomasz Siczek

Final project University of Freiburg

The University of Freiburg has been working on improving the process of Tooth Cementum Annulation (TCA), a method for calculating the age of an individual similar to tree ring counting. The current process involves manually counting hard to see rings in the teeth, a process that takes hours and requires cross-validation. The University tasked Thomas Oliver and Tomek Siczek with seeing if the process could be improved by building a Deep Learning Model. This model would learn from the images and give a result more quickly than a human could. Although the model was not able to be better than a human counter, the students were able to develop a process to create images with just the tooth cementum with 91% accuracy. With better images, the two are confident that using Deep Learning for TCA is an achievable goal.

A big thank you goes out to all partner companies and institutes who have provided us with exciting project ideas and of course to all our students for the great cooperation and the brilliant final results of the projects during the last three months.