SMU DataArts - Cultural Data Profile


Evaluative AI: Rooting Out Bias in AI to Support Robust Program Evaluation

  • Posted Dec 14, 2023

7-minute read

While much of it’s global impact is still to be understood, has world perception shifted with the recent release of generative artificial intelligence (AI) tools? From image creation with Midjourney and Stable Diffusion to text generation with ChatGPT, industries across every sector are exploring how these technologies can increase productivity and, arguably, unleash creativity in daily operations. Need a first draft of a fundraising letter? Go to Need an image with specific content and can’t find a stock image? Go to

Within the nonprofit field generally and nonprofit arts field specifically, consulting firms, CRM platforms, and service organizations alike have been working to integrate these technologies into their offerings and educate the field on their use. A few recent examples:

While all of this work and education is crucial for our sector, are generative or predictive AI tools (such as risk score assignment and donor identification) the only areas of technological opportunity for philanthropy and the arts?


The power of AI can go even further for our sector. At SMU DataArts, we are exploring the use of AI for program evaluation, asking “What can we learn about our decision-making processes from these technologies?” By focusing on evaluation, we can start to probe potential biases in how we make decisions. Every evaluation then also helps us improve our AI models to make even better predictions going forward. So, what does evaluative AI look like?

© irissca/Adobe Stock, woman opening OpenAI's ChatGPT on laptop (2023). © irissca/Adobe Stock, woman opening OpenAI's ChatGPT on laptop (2023).

Using AI to Evaluate the Fairness of Political Districts

In 2017, researchers at Carnegie Mellon University devised a method that enabled competing political parties to “fairly” draw U.S. congressional districts in a state. Similar to children taking turns slicing up a cake, with both sides aiming to acquire the best pieces, this back and forth game of dividing a state into representative pieces based on population keeps both parties in check. The result of this process in defining congressional districts results in very “fair” regions. To test the validity of this “game”, the researchers ran a trillion computational simulations to study the maps.

The results of this work are similar to how we use predictive and generative AI applications today – create a process/model to generalize over our data, and voila, we can predict and generate new things! But here’s the twist: the researchers (and courts!) flipped the game around. They wondered, if instead of slicing for “fair” results, could one apply the same method in reverse, to check if existing congressional district boundaries were “fairly” drawn? The answer is YES!

In 2018, the Pennsylvania Supreme Court relied on the CMU researcher’s expert testimony to determine that partisan gerrymandering of congressional districts in the commonwealth were crafted with political bias. Evaluative models proved consequential in protecting equitable political representation across the commonwealth.

Why Do We Care So Much About the Data Used to “Train” AI Models?

Predictive and generative AI have the potential to perpetuate existing biases, which are enshrined within the data they use. In December 2022, a digital art collective created a never-ending live-stream animation called Nothing, Forever that used an AI model to generate content similar to the show Seinfeld. The underlying text data used in training the model resulted in the creation of offensive and harmful jokes, and the project was suspended.

While Nothing, Forever did generate harmful content, it is perhaps not on the same scale as the use of AI for predictive policing. In 2016, the media company ProPublica released an investigation into the use of an AI system for predicting recidivism called the Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS.  The COMPAS AI system used historic arrest data to assign a risk score to individuals to predict whether they would re-offend. The investigation found:

[The] assessment tool correctly predicts recidivism 61 percent of the time. But blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend. It makes the opposite mistake among whites: They are much more likely than blacks to be labeled lower risk but go on to commit other crimes.

The impacts of these flawed risk scores cause real harm and “feeding this data into predictive tools allows the past to shape the future.” Should the unjust enforcement of Jim Crow laws in the past contribute to determining a person’s recidivism risk score today? No, it should not. So, from our evaluative framework, tools like COMPAS can be deeply flawed at predicting the future, and they can perhaps shed light on the injustices of things like historical over-policing and systemic racism.

Instead of accepting that AI applications will perpetuate the biases in our society (and in our data), what if we used AI to understand and change those biases?

Connecting All the Dots: AI in Arts and Philanthropy

Systemic racism and historical inequities are very much present in the arts and philanthropic sectors as well. And the application of risk scores isn’t limited to policing; entities applying for funding from many federal agencies are assigned risk scores when applying for grants. GrantSolutions, an entity within the US Dept. of Health and Human Services, uses historic data to determine how risky grant applicants might be. If you’ve applied for funding from the Small Business Administration, the Dept. of Housing and Urban Development, or many other federal agencies, you might have a predicted risk score associated with your application. Do you know how “risky” your organization is? Unlike COMPAS, we don’t have a public evaluation of the GrantSolutions system and its impact on grant application success. This type of evaluation is needed.


At SMU DataArts, we are learning from these efforts which have shaped our goal to apply AI and data evaluation frameworks to grant application processes with intentions on rooting out biases and unfairness in these processes, making grantmaking practices more equitable.

We’re applying AI methods to the grant application evaluation process in order to identify successes and biases in this human-driven process that can be used by grantmakers to improve their practices going forward. Earlier this year, we hosted a webinar with Greater Pittsburgh Arts Council (GPAC) to explore our mutual interest in evaluating one of their grant programs that provided funding from the National Endowment for the Arts to Pittsburgh-area artists and organizations. With a strong focus on equity in their grantmaking processes, GPAC is a great partner in exploring this research as a pilot that could benefit philanthropic and governmental organizations of all types.

In August 2022, we participated in a conversation with the National Assembly of State Arts Agencies and their peer group of grants officers from states across the US. The discussion was insightful and provided context on how government grantmakers are thinking about the use of AI in their work today. Topics and questions came up such as:

  • Can we use AI to be more efficient in our work?
  • Can we use AI to make our efforts more accessible?
  • How can evaluative AI be used to understand the broader arts and culture environment?
  • And many more insightful topics


Since predictive AI tools are trained using existing datasets, grantmakers exploring AI applications will need to be wary of enshrining their existing biases into such tools. Luckily, evaluative AI can also be a tool in understanding and reducing the bias in our human decision-making and generation even as the sector begins to explore using AI for these functions.

In the coming months we will release research on our application of evaluative AI to grantmaking, providing a guide for grantmakers that wish to harness the power of AI to improve their operations and equitable distribution of funds to their communities.

From relatively simple AI models such as decision trees to the complexity of large language models and neural networks, we strive to provide insights and best practices to support the nonprofit, philanthropic, and arts & culture sectors. We are committed to harnessing and studying the power of evaluative artificial intelligence to help shape the future of equitable grantmaking.

Daniel Fonner is the Associate Director for Research at SMU DataArts where he manages the organization’s applied research agenda for projects ranging from program evaluation and workforce demographics to using artificial intelligence to better understand arts and culture audiences.

Grantmaking and the AI Bill of Rights: Safeguarding the Nonprofit Sector

In 2022, the White House Office of Science and Technology Policy (OSTP) released the Blueprint for an AI Bill of Rights, which “identifie[s] five principles that should guide the design, use, and deployment of automated systems to protect the American public in the age of artificial intelligence.”

Read the Article


Leave yours below.