Dandelion
Detecting and erasing bias from multimodal foundation models
In the the Data Curation for Trustworthy AI project (report can be found here) we define bias as when a model performs unequally along the dimension of a selected characteristic in the data for which the user expects. Dandelion aims to first be able to measure the amount of bias present in any given multimodal foundation model as defined by a dynamic selected characteristic provided by the user at runtime. Then Dandelion will identify the dimension and location of that bias in the model itself, and further attempt to erase the concept of the selected characteristic from the model.
We are currently exploring activation probing and steering. We are comparing different vector retrieval methods to assess what sort of data (and how much) is needed to achieve a decent steering vector. We are comparing these methods to weight-space concept-erasure methods.
This project is in progress. I am the PI on this project for a small team. This project was an internally funded research project that I pitched and won a year-long grant for.