Sponsored by the University of Maryland Center for Machine Learning and Capital One, the purpose of the Fairness in AI seminar series is to explore the topic of fairness in AI, ML and theoretical computer science by having researchers present their cutting-edge work on this family of problems. The talks will take place virtually on Mondays from 11 a.m. to noon. Recordings will be posted here for later viewing.
Talk announcements and Zoom links will be distributed through the Center for Machine Learning mailing list, which you can sign up for here. If you are interested in giving a talk, please contact Leonidas Tsepenekas.
With the increasing role played by AI and algorithms in our lives, it is well-recognized now that fairness should be a key aspect of such AI systems, to avoid automated decision-making that may (inadvertently) be biased. After surveying some of the approaches considered in this general area, we will discuss some of our work on fairness–particularly in unsupervised learning. The talk will only assume a general exposure to computer science and to AI.
Demand for resources that are collectively controlled or regulated by society, like social services or organs for transplantation, typically far outstrips supply. How should these scarce resources be allocated? Any approach to this question requires insights from computer science, economics, and beyond; we must define objectives (foregrounding equity and distributive justice in addition to efficiency), predict outcomes (taking causal considerations into account), and optimize allocations, while carefully considering agent preferences and incentives. Motivated by the real-world problem of provision of services to homeless households, I will discuss our approach to thinking through how algorithmic approaches and computational thinking can help.
As machine learning algorithms have been widely deployed across applications, many concerns have been raised over the fairness of their predictions, especially in high stakes settings (such as facial recognition and medical imaging). To respond to these concerns, the community has proposed and formalized various notions of fairness as well as methods for rectifying unfair behavior. While fairness constraints have been studied extensively for classical models, the effectiveness of methods for imposing fairness on deep neural networks is unclear. In this paper, we observe that these large models overfit to fairness objectives, and produce a range of unintended and undesirable consequences. We conduct our experiments on both facial recognition and automated medical diagnosis datasets using state-of-the-art architectures.
Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of stochastic pairwise constraints, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others Individual Fairness. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints.
In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., membership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of "metric membership," where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach, and also surface nuanced concerns when group membership is not known deterministically.
As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
The Open Images Dataset contains approximately nine million images and is a widely accepted dataset for computer vision research. As is common practice for large datasets, the annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image. In this research, we present a new set of annotations on a subset of the Open Images dataset called the "MIAP (More Inclusive Annotations for People)" subset, containing bounding boxes and attributes for all of the people visible in those images. The attributes and labeling methodology for the "MIAP" subset were designed to enable research into model fairness. In addition, we analyze the original annotation methodology for the person class and its subclasses, discussing the resulting patterns in order to inform future annotation efforts. By considering both the original and exhaustive annotation sets, researchers can also now study how systematic patterns in training annotations affect modeling.
Current AI frameworks using deep learning (DL) have met or exceeded human capabilities for tasks such as classifying images, recognizing objects (ImageNet), or facial expressions. Medical AI is also approaching clinicians’ performance for diagnostics tasks. This success masks real issues in AI assurance that will impede the deployment of such algorithms in real-life production systems. Two of the most critical concerns affecting AI assurance are privacy and bias. Privacy leakages can invalidate HIPAA or HITECH compliance, interdicting the use of DL diagnostic models in healthcare. Additionally, DL systems’ performance strongly depends on the availability of large, diverse, and representative annotated training datasets, which are often unbalanced with regard to factors such as gender, ethnicity and/or disease type, resulting in diagnostic bias. In this talk, we will introduce our recent progress in developing algorithms to address bias as well as approaches to assess possible risks in existing algorithms for privacy/membership attacks, and propose ways to effectively defend against such privacy attacks.
Clustering problems are well-studied unsupervised learning problems that are widely used in practice. Recently, there has been a trend in defining fairness in clustering and devising fair clustering algorithms. In this talk we focus on the k-center objective: given a set of points in the metric space and a parameter k, cover the points using k balls with minimum radius. We discuss two definitions of fair k-center: individual fairness (introduced by Jung et al. FORC '20) and the lottery model (introduced by Harris et. al. NeurIPS '18), and show how they are connected to a previously studied problem called the priority k-center problem (Plesnik '87). In this problem, each point v comes with a radius r_v and demands a center to be opened within this radius. This has a 2-approximation due to Plesnki, that is, an algorithm that outputs a solution where each point v has a center at distance at most 2r_v. Our main contribution is approximating priority k-center with outliers: where the algorithm is allowed to discard a certain number of points before clustering. We even take a step further and consider more general constraints on the set of centers that can be opened. For example, matroid constraints (open centers are independent in a given matroid) or knapsack constraints (each center has a cost and the solution has to fit in a given budget). This framework reproduces Harris et. al.’s approximation for the chance k-coverage problem and gives the first approximation for matroid and knapsack versions of the problem. This is a joint work with Tanvi Bajpai, Deeparnab Chakrabarty, and Chandra Chekuri.