Dataset/Codebook
The Major Components of INSIGHT+
Technical Notes:
ETL: Extraction, Transformation, and Loading. It is a common process of combining data from multiple sources into a large, central repository (database).
Fine-tuned LLM: a language model specifically tailored through additional training on a focused dataset to enhance its performance and accuracy in a particular domain or task. Our model trained on proprietary government occupation data generates accurate metrics to evaluate the AI vulnerability and augmentability of public sector jobs.
MongoDB: is a scalable, flexible NoSQL database that stores data in JSON-like documents, enabling efficient storage and querying of complex data types and structures.
API: Application Programming Interface is a set of rules and protocols that enables different software applications to communicate with each other. Our API allows for easy data download and integration for government professionals, researchers, and software developers.
BLS: The Bureau of Labor Statistics’s Occupational Employment and Wage Statistics (OEWS) dataset.
Chen: Chen's Agency Ideological Score calculated ideological scores for 74 U.S. federal agencies, using campaign contributions to assess bureaucratic agency ideologies (refer to the codebook for details).
Selin: Selin's Agency Independence Score reflects structural independence across 321 federal agencies, analyzing appointment limits and policy review restrictions (refer to the codebook for details).
RCL: Richardson, Clinton, and Lewis's Agency Ideological Score - The authors surveyed federal executives, expanding on Chen's work to gauge agency ideologies (refer to the codebook for details).
CLEAR Initiative Data Hub
Publicly Available Data
For more information on these data and for any general use or publication, please contact wresh@usc.edu.
US Senate-Confirmed Presidential Appointee Vacancies (1989-2020)
US Agency GPRA Goals 2000-2012, Coded; by Heejin Cho, Yongjin Ahn, and William Resh