Data Science capstone projects batch #31

by Ekaterina Butyugina

We’re excited to highlight the impressive accomplishments of our May cohort graduates, who successfully completed both the program and their final projects.

Over the last three months, the dedicated students of Zurich’s Batch #31 and Munich’s 10th cohort tackled a wide variety of complex, hands-on challenges. Their passion, technical skill, and perseverance shone through in every project.

Take a look at how our graduates are using data science to generate insights, push boundaries, and create real-world impact.

Smarter LNG Trading with AI: A Bootcamp Team’s Forecasting Solution

Students: Juan Zhang, Daniel Florit, Jianhui Xu

In the fast-moving world of global energy markets, agility and data-driven insight can make or break a trade. That’s the challenge faced by BKW, a Swiss energy company navigating the complex landscape of liquefied natural gas (LNG) trading. To support smarter decision-making, a trio of Bootcamp graduates, Dani (Finance), Juan (MBA), and Jensen (Physicist) teamed up to build an AI-powered tool that simplifies profitability simulation and forecasting for LNG routes.

The Problem

LNG traders must analyze highly dynamic market conditions, including price volatility across multiple terminals, shifting demand, and opaque cost structures. Manual calculation of voyage profitability is time-consuming and often error-prone, especially in fast-changing geopolitical contexts.

The Solution

The team built a robust deep learning forecasting model using LSTM to predict LNG benchmark prices (TTF, PVB, Henry Hub) across 30-, 60-, and 90-day horizons. They tackled the challenge of unpredictable events, like wars or crises, by optimizing their models with flexible lookback windows and engineered features. Forecasts were validated using MAE and tailored for real-world accuracy.

Streamlist app preview

To make the results actionable, they developed a user-friendly Streamlit web application. The platform allows traders to:

• Simulate netback and profit per voyage
• Compare trade routes interactively
• Adjust costs like regas fees or fuel loss
• Explore the market visually with map-based trade scenarios

What’s Next?

The team aims to connect the app to real-time price feeds, forecast slot availability, and apply ensemble models to boost long-term forecast reliability.

Real-World Clinical Data Dashboard: DataInspector by TriNetX

Students: Dr. Daniel Rodriguez Gutierrez, Dr. Cemil Kerimoglu

TriNetX is a global leader in real-world data, offering researchers secure access to diverse clinical datasets to accelerate discovery. Despite this, project managers working with TriNetX data often face difficulties in evaluating dataset quality, sharing insights securely, and making sense of complex information.

To address these challenges, Daniel and Cemil designed an intuitive, secure, and interactive dashboard that transforms how project managers explore and communicate real-world clinical data from the TriNetX ecosystem.

Understanding the Need: Clarity Through Visualization

The project began with identifying the core needs of project managers: a tool to assess complex datasets for decision-making, mechanisms to enable communication without compromising patient privacy, and visual analytics that translate complex data into actionable insights through interactivity.

DataInspector was developed to shift the user experience from complexity to clarity, equipping project managers with a streamlined interface to explore, filter, and share data meaningfully.
Data filtering, center-level overview, and summary visualizations are central to the dashboard.

A Comprehensive Solution for Project Managers

The resulting dashboard offers:

Custom charts and metrics tailored to user needs.
Privacy-aware design to ensure secure information handling.
Exportable summaries and visuals to support cross-functional collaboration.

This tool is especially designed for the TriNetX ecosystem, where large volumes of clinical data demand both scientific rigor and user-friendly access.

Dashboard preview

A streamlined dashboard interface tailored for real-world data exploration and presentation.

This capstone project showcases how thoughtful data design and domain-specific insight can unlock the full potential of real-world data for healthcare innovation.

Anomoldy: AI-Based Anomaly Detection for Plastic Molding

Students: Marcia Cabral, Melvin John, Leon Siegel

Hadi-Plast is a family-owned injection molding company specializing in certified production of technical precision parts from thermoplastics since 1977. Plastic parts are used across industries, from automotive to electronics, where consistent quality is critical. Visual inspection is still often manual, making it time-consuming and inconsistent. The student project Anomoldy focuses on automating defect detection using image-based anomaly detection.

Instead of requiring labeled examples of every possible defect, the team used unsupervised learning to model what “normal” looks like. Any significant deviation is flagged as a potential anomaly, which is particularly useful in settings where defective examples are rare or varied.

They developed a Streamlit-based interface using neural network-based models from the Anomalib library, such as Patchcore, PaDiM, CFlow, and EfficientAD. The application processes uploaded part images and highlights potential anomalies using heatmaps.

To improve model performance, the students implemented preprocessing steps like scale- and rotation-aware template matching to isolate the part from the background. They also simulated defects — such as scratches, cut corners, or missing pins — on clean images to artificially expand the dataset and test model sensitivity.

The uploaded part images and highlights potential anomalies using heatmaps

Throughout the project, the team examined how image quality and augmentation methods influence model robustness. The system was able to detect subtle defects on previously unseen inputs with 96% accuracy, demonstrating its generalization ability.

The resulting tool provides a foundation for automated quality control in plastic part manufacturing, with the potential to reduce inspection time and improve consistency.

Automating the End-of-Site Report

Students: Sujay Ray, Amos Schtalheim, PhD, Aiyham Katranji, Marco Taglione

VSL International is a specialist construction company focused on engineering, building, repairing, upgrading, and preserving transport infrastructure, buildings, and energy production facilities.

End of Site (EoS) reports play a crucial role in construction, capturing everything from lessons learned and risks to contract outcomes and productivity. Traditionally, generating these reports meant sifting through hours of interviews and hundreds of pages of project documents-a tedious and error-prone task.

A multidisciplinary team, Aiyham, Sujay, Marco, and Amos, set out to automate the complex EoS reporting process for large-scale construction projects. By applying AI to the traditionally manual and time-consuming workflow, they created a streamlined digital pipeline – and gained valuable insights along the way.

The Solution: An AI-Powered Reporting Workflow

The team designed and built an end-to-end, semi-automated system to generate structured EoS reports. Their approach combined modern AI with practical engineering know-how:

Smart Upload & Translation: Users upload raw interview transcripts and project files. The system automatically detects and translates files into English when needed, preserving speaker roles and technical language.
Intelligent Parsing: Transcripts are parsed into question-and-answer format, with filler removed and fragmented sentences merged for clarity.
Automated Question Matching: Every interview sentence is semantically matched to indexed EoS questions using GPT-based models, aligning freeform discussion with a structured reporting framework.
Source Comparison: For each topic, the system compares transcript and document answers, using AI to select the most relevant information.
Structured Data Extraction: Key fields-such as Yes/No, keywords, quantitative results, and summary comments-are extracted for each report section using prompt engineering.
Traceable, Auditable Outputs: The final result is a set of clean, audit-ready CSV tables and executive summaries, all linked back to original source material.
Everything is delivered through an easy-to-use web application built on Streamlit, ensuring construction teams can review, adjust, and download reports with minimal effort.

Dashboard

Key Results

Reduced manual effort: Up to 80% of the EoS reporting workflow is now automated.
Consistent and accurate outputs: Data is always traceable to its original source for transparency and auditing.
Faster reporting: What once took days can now be accomplished in hours.

Lessons Learned

Human expertise matters: Technical terms and project logic required careful handling throughout the automation process.
Best results come from hybrid approaches: Combining insights from both transcripts and documents provided richer, more reliable data.
Human-in-the-loop design: The app is built for engineers and managers to review and validate outputs, keeping expertise at the center.

What's Next?

With this project, the team demonstrated how AI and domain expertise can transform even the most specialized workflows. Future steps include scaling the system to new projects, refining classification logic, and exploring integration with project management tools for full end-to-end digital reporting.

Want to know more about the technology and workflow behind this solution? Reach out to the team or stay tuned for more project showcases!

WaitNoMore: Predicting Waiting Times for Roadside Assistance

Students: David Fritsch, Gaurav Jauhri, and Guilherme Samora

Assistance partner GmbH & Co. KG, Germany’s second-largest roadside assistance provider, connects insurance companies with roadside assistance providers, ensuring drivers get timely help when a car problem occurs. A key challenge is providing customers with a reliable estimated time of arrival (ETA) for tow or repair vehicles. Currently, ETAs are manually inserted in the system by provider staff. The company wanted to explore whether machine learning could do better.

David, Gaurav, and Guilherme began with a dataset of 17k incidents from four bigger partners, containing 19 usable features, including request timestamps, geographic coordinates of incidents and providers, service type, and vehicle information. To enrich this, they created 65 engineered features by including, among others, weather conditions at the time and place of the incident, road distance, and travel time, taking into account historical traffic, operational workload indicators (recent jobs per provider), and defect categories.

Then, a wide range of machine learning models, from simple regressions to more complex architectures such as Artificial Neural Networks (ANNs), was tested in order to identify patterns that could improve ETA predictions.

No more waiting times

Despite extensive feature engineering and experimentation, the models’ performance was nearly identical to a straightforward baseline: the simple average waiting time per provider. This indicated that the current dataset had limitations for substantial improvement through machine learning.

Rather than implementing a model with limited impact, we proposed two strategic steps:

Preselect realistic ETAs in the provider app: Change the preselected ETA value from the current default of 5 minutes to each provider’s own historical average. This reduces manual scrolling in 5-minute increments, lowers effort at case acceptance, and makes it easier for providers to select a realistic waiting time.
Introduce ETA accuracy as a KPI: Add ETA accuracy to the provider performance evaluation. This would encourage providers to base estimates on current conditions rather than fixed default values.
Enhance data collection: Add real-time inputs such as live vehicle and driver locations. This creates a case-by-case signal beyond historical averages and can make future ML approaches viable.

The key outcome of this project was a clear, evidence-based conclusion: with the current data, a machine learning model cannot meaningfully improve ETA predictions. This clarity helps avoid future internal debates and redirects efforts toward process and data changes that can deliver immediate, practical benefits for customers and providers alike.

Estimation of Motorcycle Production Across Europe

Students: Nargiz Rüter, Valentina Pavlovic, Giorgio Semadeni

Power Systems Research (PSR), a leader in global engine and powertrain analytics, collaborated with our team to address the lack of current, structured data in the European motorcycle market. The objective was to engineer a robust data pipeline for estimating motorcycle production volumes. This was achieved by leveraging publicly available registration, import, and export data from Italy, France, Germany, Spain, and the UK.

The project's initial phase involved sourcing data from public repositories, including government portals, enthusiast websites, and commercial trade APIs. Each country presented unique data ingestion challenges; for instance, Spain's data was often embedded in PDFs, while France's arrived in inconsistent image formats.

To overcome these hurdles, a sophisticated hybrid data extraction system was innovated. This system integrated rule-based logic for structured data, Optical Character Recognition (OCR) for image-based text, and advanced Large Language Model (LLM)-assisted parsing for complex or semi-structured formats, ensuring maximum data capture efficiency and accuracy.

How to structure data

Due to limited training data for model-based forecasting, a fundamental economic formula was applied to infer production volumes:

Production = Registrations + Exports - Imports

This formula derived an estimated production figure for each country. To provide a robust measure of uncertainty, shortfalls and surpluses were quantified with 95% confidence intervals, offering PSR a clearer understanding of potential production ranges.

Results

The project yielded several significant outcomes:

Unified Registration Data: Disparate motorcycle registration data across all five target countries was successfully unified and standardized, providing a consistent dataset for analysis.
Estimated Production Shortfall: The analysis revealed an estimated production shortfall of approximately ~1.99 million units, with a calculated range of −820K to −2.32M units, offering nuanced insight into the potential deficit.
Proposed Automation Strategies: Comprehensive automation strategies were proposed for future data ingestion and enrichment, designed to streamline processes and ensure the long-term sustainability of the market intelligence pipeline.

By delivering deeper market visibility, the team empowered PSR with granular insights for more precise and timely business decisions for client partners in the European motorcycle market. The robust data pipeline and methodologies developed lay a crucial foundation for future scalable, automated forecasting tools. This collaboration transformed fragmented data into structured, actionable insights, filling a critical data gap and establishing a repeatable framework for continuous market intelligence.

Conclusion

As we wrap up the Data Science Final Projects with Group #31, we want to express heartfelt thanks to the incredible partner companies who made this journey so much richer. Your real-world challenges gave our students the chance to push boundaries, apply their skills, and deliver creative, practical solutions. We're deeply grateful for your trust and collaboration.

To our amazing students who joined us back in May, what a ride it’s been! Your hard work, curiosity, and growth over these past months have been nothing short of inspiring. We’re proud of what you’ve accomplished and excited to see where your talents take you next. Keep exploring, keep building. The data world needs more minds like yours.

If you’ve been following along and are feeling inspired, we’d love to welcome you into our next data science cohort. Head over to Constructor Academy to find out how you can start your own journey.