Data Science capstone projects batch #28

by Ekaterina Butyugina

Group work
We're thrilled to celebrate the outstanding achievements of our recent graduates who joined us last November and dedicated themselves to completing the course and their capstone projects. 
Over the past three months, the talented individuals of Batch #28 in Zurich, along with the accomplished 7th cohort in Munich, undertook a diverse range of challenging projects. Their skills, enthusiasm, and commitment were evident throughout their work.

We are thankful to HP for their invaluable support, providing us with cutting-edge Z by HP workstations that empowered our students to achieve even more and contribute to their success.
We invite you to explore these inspiring examples of how our students are applying data science to uncover valuable insights, explore new frontiers, and make a meaningful impact.
 


Weather-Based Flight Delay Prediction

Students: Martina Wengle, Ralf Reuvers,  René Falquier

The project aimed to predict departure delay probabilities based on weather conditions at the scheduled departure time for specific passenger flight routes. Sponsored by the Free Flight Lab, this initiative sought to identify regions that could benefit from more precise aviation forecasts.

The goal was to provide airline Network Operations Centers with real-time delay probability insights, enabling proactive decision-making to minimize disruptions, reduce costs, and improve passenger experience.

The dataset was built from scratch using a custom query logic to extract data from Flightradar24, FlightAware, and the Aviation Weather Exchange Application Programming Interfaces (APIs). Over three years, 220,000 flights across 64 routes were queried, generating a dataset with 26.4 million data points after merging flight and weather data.

Various machine learning algorithms were tested, with Random Forest outperforming other algorithms such as XGBoost in the f2-score metric. This metric prioritized minimizing false "on-time" predictions, as underestimating delays could cause further operational disruptions. While a false delay prediction is manageable, missing an actual delay can cascade into costly downstream effects.
Predicted Probabilities of Departure Delay
The key deliverable was an interactive dashboard displaying delay predictions per route (see picture above). The model achieved a 70% f2-score overall, with over 90% accuracy on some routes. Given these variations, further investigation led to key insights:
  • Operational factors dominated predictions: SHapley Additive exPlanations (SHAP) explainability analysis revealed that the model prioritized operational patterns over pure weather data, even when trained exclusively on weather features. For example, air pressure density altitude readings often pointed to certain airports, suggesting that the model was influenced by regional operational characteristics rather than weather conditions themselves. 
  • Simpler routes led to better predictions: Routes with fewer operational complexities yielded more accurate delay forecasts, suggesting that airline- and route-specific models could improve prediction reliability for Network Operations Centers.
  • Low visibility had a disproportionate impact: SHAP analysis showed that meteorological visibility was the most influential weather-related factor, likely due to legal Air Traffic Control (ATC) separation requirements in low instrument flight rules (LIFR) conditions – even in the absence of severe weather events like snowstorms or icing.

These findings indicate that an operations-level prediction suite for weather-related delays is feasible. The next steps for further refinement and deployment include:
  • Operator- and Route-Specific Models: Tailoring models for individual airline operations to improve accuracy.
  • Weather Forecast vs. Report Comparisons: Enhancing predictive capabilities by integrating forecast accuracy into the model.
  • Quantified Delay Predictions: Moving beyond binary classifications to provide precise delay durations.
  • Fully Global Dataset: Expanding the dataset to cover worldwide routes for greater generalizability.

Martina, Ralf, and René are proud of their progress in just four weeks and look forward to seeing the Free Flight Lab advance their work. They extend their gratitude to Mr. Kristjan Rognvaldsson for his industry expertise and the Constructor Academy team for their support.
 
 

From Viral Customers to Valuable Insights

Students: Roberto Gonzalez, Ammar Alghouli, Christian Schmid-Schönbein

Best Secret, a prominent name in luxury fashion retail, has undertaken a project to enhance its business performance by leveraging data-driven insights to predict revenue across customer cohorts (e.g., customers registered in specific years) and markets. 

In this project, by harnessing the power of time-series analysis, the team aimed to forecast revenue trends for the next 18 months of the company, providing invaluable insights into customer cohorts and market performance. By focusing on specific customer segments and analysing market behaviour, the team established their approach to provide forecasts for better decision-making for the company´s warehouses.
The data structure
The team´s data analysis revealed that the amount of customers is the most critical factor in revenue generation. Customers are categorized into two types: "viral_customers" (invited by existing members) and "customers" (acquired through company campaigns). The team's forecasting methodology involved a three-step approach: 
  1. First, predicting viral customers for 18 months; 
  2. Second, using this projection as a foundation, the team proceeded to predict the total number of "customers". Finally, with both predictions in hand, the team generated a detailed revenue forecast for the 18-month period.  
Predicted revenue
To conclude, the team´s forecasting model demonstrated remarkable accuracy, achieving a 6% Mean Absolute Percentage Error for predictions based on the training data. This level of precision provides the company with a highly reliable tool for strategic decision-making. The model's trustworthiness can be instrumental in optimizing inventory management, helping to prevent both stockouts and overstock situations in warehouses. 
 
 

Conclusion

As we conclude this successful session with Data Science Final Projects Group #28, we extend our sincere thanks to the companies that provided invaluable projects for our students. Your partnership has enriched their learning experience and facilitated the development of innovative solutions to real-world challenges.

To the students who joined us in November and dedicated themselves fully to completing the course and their final projects, we commend your exceptional efforts. Your commitment, skill, and passion for data science have been truly impressive. We wish you all the best in your future endeavors, and we are confident that you will continue to innovate and make a significant impact in your chosen fields.

For those inspired by these achievements and interested in pursuing their own data science journey, we are pleased to announce our upcoming program. Visit the Constructor Academy website to learn more and discover how you can join the next cohort of data science innovators.

Interested in reading more about Constructor Academy and tech related topics? Then check out our other blog posts.

Read more
Blog