Data Science capstone projects batch #22

by Ekaterina Butyugina

Natural language processing abstract visualization
We would like to take a moment to extend a heartfelt shout-out to all the students who joined us in Spring, pouring their hearts and souls into conquering the course and capstone projects.

In just three short months, the incredible Data Science enthusiasts from Batch #21 in Zurich, along with the accomplished third cohort from Munich, admirably tackled a diverse array of challenging projects. Their outstanding skills and unwavering dedication were on full display. This time, a significant role in the students' success was played by HP, in collaboration with Friedrich Stahl, who provided us with two exclusive Z by HP workstations.

All the projects leverage deep learning models and involve heavy computations. Thus, we needed powerful machines to train royalty-free models, and thanks to HP and Friedrich Stahl's support, our students had access to these resources without any time constraints. This greatly simplified the work process and allowed for more efficient learning and project development.

These machines also give us a significant advantage compared to our competitors. We invite you to witness the awe-inspiring potential of data science firsthand, as our students boldly push boundaries, unearth invaluable insights, and create a substantial and lasting impact.


Flexible Litter Database

Students: Jose Carlos Araujo Acuña, Holly L. Capelo, Mehran Chowdhury, Rafael Luech

Cortexia employs computer vision to detect waste in urban areas, resulting in an efficient (cost-effective) planning of the cleanup routes to achieve optimal standards of cleanliness. While an image recognition methodology is already in place for this purpose, the models would be improved by the capability to carefully cut out the identified objects and use such cutouts (segmentations) for augmenting the data that is used for further training.

The goal of the project is to establish a methodology for producing image segmentations in an automated manner. The students have used the publicly available pre-trained model from Meta research, known as the "Segment Anything Model" (SAM). They used the model version and corresponding checkpoint with the largest neural network backbone. As input, they worked with over seven thousand annotated images, using bounding boxes as the segmentation prompt, resulting in up to 80 segmentations per image. 

For a subset of the data consisting of approximately 250 images, hand-labeled segmentations were provided. Jose, Mehran, Holy, and Rafael obtained high on-average scores (IOU, DICE) because of comparing the automated mask generation with the ground truth (see the figure). In many cases, the automated segmentation method performed better than the hand-labeled masks, in terms of detailed contours captured at the image boundaries. It was found that the trend exists that objects with sizes approaching the pixel resolution of the image tend to score worse than larger objects.

Automated segmentation method

Jose, Mehran, Holy, and Rafael experimented with fine-tuning the model using domain-specific images, but their preliminary results showed no improvement over the standard model, which performs well for the simple bounding-box segmentation task. Further extensions to the fine-tuning could involve training with additional prompt types such as label id as text.

Ask Fredy: AI-Powered Q&A Chat-Bot 

Students: Ena Dewan, Rena (Xinyue) Pan, and Liran

Alturos Destination is a committed enterprise focused on empowering tourist destinations in executing their digital sales strategies. In pursuit of delivering an enhanced solution for its valued customers, Alturos aims to introduce a cutting-edge chatbot named "Fredy." This innovative tool will be designed to serve and support both Alturos' esteemed clientele and dedicated employees. Fredy will be uniquely tailored to excel in providing precise responses regarding Arturo's operating system and other offerings, harnessing the extensive reservoir of enterprise data readily available within the company.

The team began by delving into the company's English tutorials. Using a foundational model, they laid the groundwork for Fredy's initial iteration. Subsequently, they seamlessly integrated Large Language Models to craft refined versions of Fredy. The ultimate choices were meticulously determined following a rigorous performance evaluation.

The creation of the baseline model involved the transformation of documents into vectors, enabling precise measurement of the distances between questions and each piece of data. From this pool, the most promising candidates were carefully selected, meticulously re-evaluated, and the finest one emerged as the chosen answer. Transformers played a pivotal role in both of these crucial processes.

For instance:
  • Hi Fredy, where can I add a contact person? 
  • The contact person is the individual associated with MyService…
If the corpus lacks an explicit answer, baby Fredy won't be able to generate it for the customer.

Fredy's improvement involved introducing it to Large Language Models (LLM), an AI algorithm leveraging deep learning, trained on extensive datasets, proficient in summarizing and synthesizing answers, bestowing generative AI capabilities.

Ena, Rena, and Liran devised a hybrid model that combines both free Open Source and paid OpenAI LLM models. They employed the HuggingFace Instruct-xl model (Open Source) for processing raw data and creating embeddings for the English corpus, while ChatGPT-4 (OpenAI, premium version, paid) was used for generating responses. This hybrid model offers advantages in terms of cost-effectiveness and boasts an impressive accuracy rate of 78%.

On the other hand, the alternative version, Flan-t5 Fredy, utilized open-source tools but provided responses less akin to human-like language, making it an ideal choice for those who prioritize privacy.

While other attempts, like flam-alpaca Fredy and Openai Fredy, were made, their performance didn't measure up to the top two models based on accuracy assessments.

Regarding performance reviews, while manual evaluations are considered optimal, they are labor-intensive. The team also explored quantitative methods, gauging the similarity between generated answers and reference points.

Fredy evolution

A WebUI was created utilizing the powerful combination of Hugging Face Spaces and Gradio. In just three weeks, Fredy achieved a remarkable level of proficiency in generating human-like responses, demonstrating impressive multilingual capabilities. Beginning with English as its foundation, Fredy now seamlessly serves a diverse range of languages through integration with Google Translator, ensuring contextually relevant answers.

Future endeavors are focused on further enriching Fredy's knowledge base, amplifying its multilingual prowess, and striving for unparalleled accuracy.

Sustainability reports assessments

Students: Claudio and Claudio

Engageability conducts a benchmark analysis called Focused Reporting every two to three years of Swiss companies' sustainability reports (2021: 151 companies). By doing so, they manually go through a checklist of numerous criteria and assess whether their clients' reports, which can be up to 200 pages long, meet them (each criterion is rated by yes/no/uncertain). Such criteria are similar to e.g. "Are metrics used by the organization to assess climate-related risks and opportunities in line with its strategy and risk management process described?"

The goal of the collaboration was to automate the assessment of sustainability reports by using Al, to save the company time. Therefore, the students wrote code that has two input files, the report as PDF and the assessment criteria as XLSX. In the code, an Al model assesses whether the criteria from the XLSX are met in the PDF and gives out a result for each criterion in another XLSX file.

To reach the goal, they tested multiple Large Language Models such as GPT which is also used in ChatGPT. They compared local open-source models to online models, to see whether they could go with a local model which is more secure (runs on a local computer), or whether they needed to use an online model which however has fewer equipment requirements (equipment is in the cloud).

Challenges that they faced were for example the complexity of the criteria, as the criteria sometimes were subjective or vague, or the criteria required a very specific level of detail in the report. Prompt engineering was another tricky part. Here, they needed to find what threshold the model should consider regarding when to answer with uncertain or yes/no.

The results were that their model could assess on average about 65% of the criteria correctly across the 6 reports (provided) they could test. However, the ratings were not always based on the right source. Specifically, the model also provided on which page and which section it based its answer, which was not always the same as Engageability did it manually.

In conclusion, not only did the models assess about 65% of the criteria correctly (yes/no/uncertain), but they were also helpful in assessing the reports faster than doing it all manually due to the models' indications on which page to look at etc.

Client sustainability diagram

Future considerations are that the models need to be tested with more reports to get representative results. Also, the models may be improved by testing them with newer models that the students didn't get access to vet (i.e., GPT4), or by training the models on more reports than Engageability could provide so far.

One step ahead: Detecting unusual human motions 

Students: Alaa Elshorbagy, Vincent von Zitzewitz and Jonas Voßemer

QualityMinds GmbH provides services in quality assurance and testing of software and machine learning systems. They also specialize in software engineering, requirements engineering, machine learning, and AI Testing including testing machine learning models for autonomous driving. 

As part of the project with QualityMinds, our students had the opportunity to delve into the fascinating world of human motion prediction. Human motion prediction involves predicting future human movements based on a time sequence of a given body position and most recent motions. QualityMinds uses advanced Deep Neural Networks (Graph Convolution Network) to predict a person's actions, looking ahead one second into the future. For some actions, the prediction is bad which led to the formulation of the project.

The primary goal was to quantify anomalies in human motions, as these unusual movements pose challenges for the prediction models. To achieve this, Alaa, Vincent, and Jonas used the Human 3.6M public dataset, which contains 3.6 million human motion action sequences.

Then they used four distinct outlier detection models, each provided a unique perspective on identifying outlying motions. To validate the findings, the students compared the prediction errors from human motion prediction models with the outliers identified. With this they proved the direct connection between outliers and failed motion predictions.

There were three main results from our project, each designed to empower QualityMinds in improving their motion prediction capabilities:
  • Outlier Detection App - an interactive tool for flexible outlier sequence analysis; 
  • Outlier Validation App - to find a correlation between a motion sequence's anomaly degree (precision score) and its prediction error;
  • Kinematic Comparison Toolkit - to compare and visualize inliers and outliers for specific actions, such as walking or eating based on kinematic key characteristics, such as joint velocity and acceleration. 

The outlier detection app

In conclusion, our collaboration with QualityMinds has equipped them with tools to improve human motion prediction for autonomous driving and other applications. By incorporating information on identified outliers, QualityMinds can improve motion prediction models used for human-robot interaction and autonomous systems, ensuring a safer and more efficient future for all. Moving forward, the team aims to expand insights by generalizing them to other public datasets. 

Alaa, Vincent, and Jonas are proud to have been part of this project and are excited to witness the impact of their work in the field of autonomous technology.

Elevate your career with Constructor Academy's cutting-edge Data science bootcamp.

Are you ready to unlock a world of limitless possibilities in a highly demanding, esteemed, and financially rewarding career? Look no further than the Data Science bootcamp offered by Constructor Academy.

Designed to equip you with the essential techniques and technologies for harnessing the power of real-world data, our bootcamp offers two flexible options: full-time (12 weeks) and part-time (22 weeks). Throughout this immersive experience, you will master transformative technologies including machine learning, natural language processing (NLP), Python, deep learning, and data visualization. 

But wait, there's more! Embark on your data science journey with our complimentary introduction to the captivating realm of data science. Simply click here to access this valuable resource and start your exploration today.

Get ready to embrace a future brimming with endless opportunities. Constructor Academy is committed to empowering aspiring data scientists like you to unleash your true potential and pave the way for unparalleled success. Join us on this exhilarating adventure, and let's shape the future of data science together.


Interested in reading more about Constructor Academy and tech related topics? Then check out our other blog posts.

Read more