| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.19 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
With the increase of computational power, usage of IT systems and tools in several industries also
increased the amounts of data generated and stored. Education is one of these fields. The opportunity
to utilize analytical and data related techniques to the data generated and stored by computer-based
educational systems is more significant than ever. Performance prediction is one of the most popular
uses for all the data generated by educational systems.
In this line of thought, the main objective of this paper is to build a predictive model capable of
classifying a student´s grade based on its Moodle system activity and several sociodemographic
variables taken from the Netpa System. All the data used belongs to student´s that attended the first
semester of 2019 at Nova Information Management School.
To achieve the objective, SEMMA Methodology was implemented. Python Language was used, with
particular emphasis on the Scikit-Learn, pandas and Seaborn packages. Raw Moodle logs were
processed and transformed into variables that represented the number of times a student navigated
to a specific page in the platform. This information was then joined with Netpa variables, and a dataset
was built. Exploratory data analysis was performed, and several model configurations were tested. The
main differences that separate the models are outlier treatment, sampling techniques, feature scalers,
feature engineering and type of algorithm – Logistic Regression, K-Neighbours Classifier, Random
Forest Classifier and Multi-Layer Perceptron.
Using a K-Neighbours Classifier and the SMOTE sampling technique an F1-Score of 0.624 and a ROC
AUC of 0.828 was obtained.
Descrição
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
Palavras-chave
Performance Prediction Educational Data Mining Learning Analytics Machine Learning
