Using large language models to help train machine learning SDG classifiers

Working Paper Date:
Category: Economic Analysis and Policy
Author: Marcelo T. LaFleur
Document Symbol: ST/ESA/2023/DWP/180
JEL Classification: O0 General Economic Development; O20 General Development Policy and Planning; C88 Other Computer Software
Keywords: Sustainable Development Goals; Machine learning; Generative AI models; ChatGPT; SDG classification; Topic models
Working Paper File:
Citation: La Fleur, Marcelo (2023). Using large language models to help train machine learning SDG classifiers. UN DESA Working Paper, No. 180. New York: United Nations Department of Economic and Social Affairs. November.

This paper proposes the use of synthetic training data generated by large language models
to improve machine learning SDG classifiers. It shows that supplementing existing training data with
synthetic data produced by the ChatGPT tool improves the performance of the SDGClassy classifier.
This addition of synthetic data is especially useful in building SDG classifiers given the limited availability
of properly labeled data and the complex, interconnected nature of the SDGs. Synthetic data thus enables
more effective machine-learning applications in this context.