Using large language models to help train machine learning SDG classifiers

Working Paper Date: November 2023

Category: Economic Analysis and Policy

Author: Marcelo T. LaFleur

Document Symbol: ST/ESA/2023/DWP/180

JEL Classification: O0 General Economic Development; O20 General Development Policy and Planning; C88 Other Computer Software

Keywords: Sustainable Development Goals; Machine learning; Generative AI models; ChatGPT; SDG classification; Topic models

Working Paper File:

DESA Working Paper 180.pdf 589.77 KB

Citation: La Fleur, Marcelo (2023). Using large language models to help train machine learning SDG classifiers. UN DESA Working Paper, No. 180. New York: United Nations Department of Economic and Social Affairs. November.

This paper proposes the use of synthetic training data generated by large language models
to improve machine learning SDG classifiers. It shows that supplementing existing training data with
synthetic data produced by the ChatGPT tool improves the performance of the SDGClassy classifier.
This addition of synthetic data is especially useful in building SDG classifiers given the limited availability
of properly labeled data and the complex, interconnected nature of the SDGs. Synthetic data thus enables
more effective machine-learning applications in this context.