| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.75 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Subscription-based educational technology (EdTech) companies face significant revenue loss
due to customer churn, making retention vital. Retaining engaged learners is generally more
cost-effective than acquiring new ones. This study develops a predictive churn model using
anonymised student data from Company X, an EdTech platform for children’s English learning,
covering the period from April 2024 to February 2025. The real name of the company has been
intentionally left out to ensure confidentiality. Following a CRISP-DM process, student activity,
engagement, and financial features were engineered over multiple time windows, capturing
evolving student behaviours. Four machine learning (ML) algorithms - logistic regression (LR),
random forest (RF), neural networks (NN), and XGBoost (XGB) - were trained and compared,
employing resampling techniques (RUS+SMOTE) to address class imbalance. The optimised
XGB model achieved the best performance, with approximately 0.84 accuracy, 0.63 F1 score,
0.85 recall, and the area under the curve (AUC) of 0.85 on test data, effectively identifying
likely churners. Shapley Additive exPlanations (SHAP) based analysis revealed that
engagement metrics, particularly the number of paid classes (both current and mean over 8
weeks), total learning time, engagement at gamified features, and customer tenure, were the
most influential predictors, confirming that highly engaged students are less likely to churn.
This interpretable model provides actionable insights for retention strategies by predicting
individual churn risk and highlighting key engagement drivers. Practically, even a 1% monthly
reduction in churn could translate to multi-million-dollar annual savings for subscription
EdTech providers. Overall, this research extends churn prediction into the EdTech domain,
demonstrates the value of long-term engagement features, and applies explainable AI to
enhance model transparency, thereby supporting its practical adoption.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Driven Marketing, specialization in Digital Marketing and Analytics
Palavras-chave
Customer Churn Prediction Machine Learning Educational Technology Subscription Business Model SDG 4 - Quality education SDG 8 - Decent work and economic growth SDG 9 - Industry, innovation and infrastructure SDG 11 - Sustainable cities and communities SDG 12 - Responsible production and consumption
