Using large language models to help train machine learning SDG classifiers
This paper proposes the use of synthetic training data generated by large language models
to improve machine learning SDG classifiers. It shows that supplementing existing training data with
synthetic data produced by the ChatGPT tool improves the performance of the SDGClassy classifier.
This addition of synthetic data is especially useful in building SDG classifiers given the limited availability
of properly labeled data and the complex, interconnected nature of the SDGs. Synthetic data thus enables
more effective machine-learning applications in this context.