Dataset/Codebook

Access to the Raw Dataset Please contact wresh@usc.edu or keunyoul@usc.edu

Access to the Codebook

Codebook

(*xlsx)

The Major Components of INSIGHT+

CDP Flowchart.drawio

Technical Notes:

ETL: Extraction, Transformation, and Loading. It is a common process of combining data from multiple sources into a large, central repository (database).

Fine-tuned LLM: a language model specifically tailored through additional training on a focused dataset to enhance its performance and accuracy in a particular domain or task. Our model trained on proprietary government occupation data generates accurate metrics to evaluate the AI vulnerability and augmentability of public sector jobs.

MongoDB: is a scalable, flexible NoSQL database that stores data in JSON-like documents, enabling efficient storage and querying of complex data types and structures.

API: Application Programming Interface is a set of rules and protocols that enables different software applications to communicate with each other. Our API allows for easy data download and integration for government professionals, researchers, and software developers.

BLS: The Bureau of Labor Statistics’s Occupational Employment and Wage Statistics (OEWS) dataset.

Chen: Chen's Agency Ideological Score calculated ideological scores for 74 U.S. federal agencies, using campaign contributions to assess bureaucratic agency ideologies (refer to the codebook for details).

Selin: Selin's Agency Independence Score reflects structural independence across 321 federal agencies, analyzing appointment limits and policy review restrictions (refer to the codebook for details).

RCL: Richardson, Clinton, and Lewis's Agency Ideological Score - The authors surveyed federal executives, expanding on Chen's work to gauge agency ideologies (refer to the codebook for details).

CLEAR Initiative Data Hub

Publicly Available Data

For more information on these data and for any general use or publication, please contact wresh@usc.edu.