by Nitin Kumar
This is the first of many project recaps that we’re writing for our future students and corporate partners to give insight into the kind of projects Constructor Academy's Data Science students get to work on during their Capstone Project.
Constructor Academy’s batch #7 (May 13, 2019 - July 31, 2019) of Data Science students worked on five projects that were provided by our industrial partners, such as Swiss International Airlines, Qard and PriceHubble. All projects involved Machine Learning and two included Deep Learning. They covered a broader space of Data Science applications. Here is a list and some details.
for qardfinance.com
As a FinTech startup, Qard analyzes e-commerce businesses applying for loans and uses a data-driven approach to identify those with a substantial risk of default on their loan payments. Qard would like to extend this system to using non-financial data. For this project, Constructor Academy's students worked on extracting e-commerce-specific non-financial data from around 400GB of structured/unstructured data that has been collected by Qard over the years. The students reached an accuracy of around 70% on identifying default cases using non-financial development of such a system would help all loan providers because they would not need to ask a borrower specific detail about their finances.
Personal student project
This was an independent project brought by one of the students with PhDs in similar fields. Image-based Genetic Perturbation screens are extensively used in research labs to identify markers of cancer-causing genes. Such screens generate petabytes of data (millions of images) and require automatic systems to analyze these images. The two students wanted to test if they could use Deep Learning, primarily Convolutional Neural Networks and Variational Auto-Encoders to automatically classify images into their category of interests. Since no labelled data was present, students had to use active learning to sequentially create their train-test data. The supervised approach produced an accuracy of >90%. A second approach using unsupervised learning based on auto-encoders needs further exploration but was already able to create real-looking computer-generated images...
for Pricehubble
This project involved the application of Active Learning with Convolutional Neural Networks to automatically classify property images into different price categories. For this project, multiple pre-trained networks (ex: ResNet and VGG16) were used as a starting point to further train them with our data. Using pre-trained networks is a standard practice in image analytics using Deep Learning. The student who worked on the challenge could achieve an accuracy of ~93% on this data.
for Constructor Academy
As an EdTech startup, Constructor Academy often looks for ways to help our students develop their learning needs using data. The central aim of this project was to identify the skill set required for technology-related jobs in Switzerland, match it with the job seeker’s own skills and background (as extracted from LinkedIn profiles), and finally offer the latter suitable positions or training programs. To this end, the students employed NLP techniques to find out the semantic similarities between job ads and candidate skills, something which most job recommendation services lack. Constructor Academy is now working on developing this project as an online tool to help not only our students but also the general Swiss public.