fairlearn.reductions package¶
This module contains algorithms implementing the reductions approach to disparity mitigation.
In this approach, disparity constraints are cast as Lagrange multipliers, which cause the reweighting and relabelling of the input data. This reduces the problem back to standard machine learning training.
- class fairlearn.reductions.AbsoluteLoss(min_val, max_val)[source]¶
Bases:
object
Class to evaluate absolute loss.
- class fairlearn.reductions.BoundedGroupLoss(loss, *, upper_bound=None)[source]¶
Bases:
fairlearn.reductions.ConditionalLossMoment
Moment for constraining the worst-case loss by a group.
For more information refer to the user guide.
- class fairlearn.reductions.ClassificationMoment[source]¶
Bases:
fairlearn.reductions.Moment
Moment that can be expressed as weighted classification error.
- class fairlearn.reductions.DemographicParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]¶
Bases:
fairlearn.reductions.UtilityParity
Implementation of demographic parity as a moment.
A classifier \(h(X)\) satisfies demographic parity if
\[P[h(X) = 1 | A = a] = P[h(X) = 1] \; \forall a\]This implementation of
UtilityParity
defines a single event, all. Consequently, the prob_eventpandas.Series
will only have a single entry, which will be equal to 1. Similarly, the index property will have twice as many entries (corresponding to the Lagrange multipliers for positive and negative constraints) as there are unique values for the sensitive feature. Thesigned_weights()
method will compute the costs according to Example 3 of Agarwal et al. (2018).- short_name = 'DemographicParity'¶
- class fairlearn.reductions.EqualizedOdds(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]¶
Bases:
fairlearn.reductions.UtilityParity
Implementation of equalized odds as a moment.
Adds conditioning on label compared to demographic parity, i.e.
\[P[h(X) = 1 | A = a, Y = y] = P[h(X) = 1 | Y = y] \; \forall a, y\]This implementation of
UtilityParity
defines events corresponding to the unique values of the Y array.The prob_event
pandas.Series
will record the fraction of the samples corresponding to each unique value in the Y array.The index MultiIndex will have a number of entries equal to the number of unique values for the sensitive feature, multiplied by the number of unique values of the Y array, multiplied by two (for the Lagrange multipliers for positive and negative constraints).
With these definitions, the
signed_weights()
method will calculate the costs according to Example 4 of Agarwal et al. (2018).- short_name = 'EqualizedOdds'¶
- class fairlearn.reductions.ErrorRate[source]¶
Bases:
fairlearn.reductions.ClassificationMoment
Misclassification error.
- short_name = 'Err'¶
- class fairlearn.reductions.ErrorRateParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]¶
Bases:
fairlearn.reductions.UtilityParity
Implementation of error rate parity as a moment.
A classifier \(h(X)\) satisfies error rate parity if
\[P[h(X) \ne Y | A = a] = P[h(X) \ne Y] \; \forall a\]This implementation of
UtilityParity
defines a single event, all. Consequently, the prob_eventpandas.Series
will only have a single entry, which will be equal to 1.The index property will have twice as many entries (corresponding to the Lagrange multipliers for positive and negative constraints) as there are unique values for the sensitive feature.
The
signed_weights()
method will compute the costs according to Example 3 of Agarwal et al. (2018). However, in this scenario, g = abs(h(x)-y), rather than g = h(x)- short_name = 'ErrorRateParity'¶
- class fairlearn.reductions.ExponentiatedGradient(estimator, constraints, eps=0.01, max_iter=50, nu=None, eta0=2.0, run_linprog_step=True, sample_weight_name='sample_weight')[source]¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.MetaEstimatorMixin
An Estimator which implements the exponentiated gradient approach to reductions.
The exponentiated gradient algorithm is described in detail by Agarwal et al. (2018).
- Parameters
estimator (estimator) – An estimator implementing methods
fit(X, y, sample_weight)
andpredict(X)
, where X is the matrix of features, y is the vector of labels (binary classification) or continuous values (regression), and sample_weight is a vector of weights. In binary classification labels y and predictions returned bypredict(X)
are either 0 or 1. In regression values y and predictions are continuous.constraints (fairlearn.reductions.Moment) – The disparity constraints expressed as moments
eps (float) – Allowed fairness constraint violation; the solution is guaranteed to have the error within
2*best_gap
of the best error under constraint eps; the constraint violation is at most2*(eps+best_gap)
max_iter (int) – Maximum number of iterations
nu (float) – Convergence threshold for the duality gap, corresponding to a conservative automatic setting based on the statistical uncertainty in measuring classification error
eta_0 (float) – Initial setting of the learning rate
run_linprog_step (bool) – if True each step of exponentiated gradient is followed by the saddle point optimization over the convex hull of classifiers returned so far; default True
sample_weight_name (str) – Name of the argument to estimator.fit() which supplies the sample weights (defaults to sample_weight)
- fit(X, y, **kwargs)[source]¶
Return a fair classifier under specified fairness constraints.
- Parameters
X (numpy.ndarray or pandas.DataFrame) – Feature data
y (numpy.ndarray, pandas.DataFrame, pandas.Series, or list) – Label vector
- predict(X, random_state=None)[source]¶
Provide predictions for the given input data.
Predictions are randomized, i.e., repeatedly calling predict with the same feature data may yield different output. This non-deterministic behavior is intended and stems from the nature of the exponentiated gradient algorithm.
Notes
A fitted ExponentiatedGradient has an attribute predictors_, an array of predictors, and an attribute weights_, an array of non-negative floats of the same length. The prediction on each data point in X is obtained by first picking a random predictor according to the probabilities in weights_ and then applying it. Different predictors can be chosen on different data points.
- Parameters
X (numpy.ndarray or pandas.DataFrame) – Feature data
random_state (int or RandomState instance, default=None) – Controls random numbers used for randomized predictions. Pass an int for reproducible output across multiple function calls.
- Returns
The prediction. If X represents the data for a single example the result will be a scalar. Otherwise the result will be a vector
- Return type
Scalar or vector
- class fairlearn.reductions.FalsePositiveRateParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]¶
Bases:
fairlearn.reductions.UtilityParity
Implementation of false positive rate parity as a moment.
Adds conditioning on label Y=0 compared to demographic parity, i.e.,
\[P[h(X) = 1 | A = a, Y = 0] = P[h(X) = 1 | Y = 0] \; \forall a\]This implementation of
UtilityParity
defines the event corresponding to Y=0.The prob_event
pandas.DataFrame
will record the fraction of the samples corresponding to Y = 0 in the Y array.The index MultiIndex will have a number of entries equal to the number of unique values of the sensitive feature, multiplied by the number of unique non-NaN values of the constructed event array, whose entries are either NaN or label=0 (so only one unique non-NaN value), multiplied by two (for the Lagrange multipliers for positive and negative constraints).
With these definitions, the
signed_weights()
method will calculate the costs for Y=0 as they are calculated in Example 4 of Agarwal et al. (2018) <https://arxiv.org/abs/1803.02453>, but will use the weights equal to zero for Y=1.- short_name = 'FalsePositiveRateParity'¶
- class fairlearn.reductions.GridSearch(estimator, constraints, selection_rule='tradeoff_optimization', constraint_weight=0.5, grid_size=10, grid_limit=2.0, grid_offset=None, grid=None, sample_weight_name='sample_weight')[source]¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.MetaEstimatorMixin
Estimator to perform a grid search given a blackbox estimator algorithm.
The approach used is taken from section 3.4 of Agarwal et al. (2018).
- Parameters
estimator (estimator) – An estimator implementing methods
fit(X, y, sample_weight)
andpredict(X)
, where X is the matrix of features, y is the vector of labels (binary classification) or continuous values (regression), and sample_weight is a vector of weights. In binary classification labels y and predictions returned bypredict(X)
are either 0 or 1. In regression values y and predictions are continuous.constraints (fairlearn.reductions.Moment) – The disparity constraints expressed as moments
selection_rule (str) – Specifies the procedure for selecting the best model found by the grid search. At the present time, the only valid value is “tradeoff_optimization” which minimizes a weighted sum of the error rate and constraint violation.
constraint_weight (float) – When the selection_rule is “tradeoff_optimization” this specifies the relative weight put on the constraint violation when selecting the best model. The weight placed on the error rate will be
1-constraint_weight
grid_size (int) – The number of Lagrange multipliers to generate in the grid
grid_limit (float) – The largest Lagrange multiplier to generate. The grid will contain values distributed between
-grid_limit
andgrid_limit
by defaultgrid_offset (
pandas.DataFrame
) – Shifts the grid of Lagrangian multiplier by that value. It is ‘0’ by defaultgrid – Instead of supplying a size and limit for the grid, users may specify the exact set of Lagrange multipliers they desire using this argument.
sample_weight_name (str) – Name of the argument to estimator.fit() which supplies the sample weights (defaults to sample_weight)
- fit(X, y, **kwargs)[source]¶
Run the grid search.
This will result in multiple copies of the estimator being made, and the
fit(X)
method of each one called.- Parameters
X (numpy.ndarray or pandas.DataFrame) – The feature matrix
y (numpy.ndarray, pandas.DataFrame, pandas.Series, or list) – The label vector
sensitive_features (numpy.ndarray, pandas.DataFrame, pandas.Series, or list (for now)) – A (currently) required keyword argument listing the feature used by the constraints object
- predict(X)[source]¶
Provide a prediction using the best model found by the grid search.
This dispatches X to the
predict(X)
method of the selected estimator, and hence the return type is dependent on that method.- Parameters
X (numpy.ndarray or pandas.DataFrame) – Feature data
- predict_proba(X)[source]¶
Provide the result of
predict_proba
from the best model found by the grid search.The underlying estimator must support
predict_proba(X)
for this to work. The return type is determined by this method.- Parameters
X (numpy.ndarray or pandas.DataFrame) – Feature data
- class fairlearn.reductions.LossMoment(loss)[source]¶
Bases:
fairlearn.reductions.Moment
Moment that can be expressed as weighted loss.
- class fairlearn.reductions.Moment[source]¶
Bases:
object
Generic moment.
Our implementations of the reductions approach to fairness described in Agarwal et al. (2018) make use of
Moment
objects to describe the disparity constraints imposed on the solution. This is an abstract class for all such objects.- gamma(predictor)[source]¶
Calculate the degree to which constraints are currently violated by the predictor.
- load_data(X, y, **kwargs)[source]¶
Load a set of data for use by this object.
The keyword arguments can contain a
sensitive_features
array.- Parameters
X (array) – The feature data
y (array) – The true label data
- property total_samples¶
Return the number of samples in the data.
- class fairlearn.reductions.SquareLoss(min_val, max_val)[source]¶
Bases:
object
Class to evaluate the square loss.
- class fairlearn.reductions.TruePositiveRateParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]¶
Bases:
fairlearn.reductions.UtilityParity
Implementation of true positive rate parity as a moment.
Adds conditioning on label Y=1 compared to demographic parity, i.e.,
\[P[h(X) = 1 | A = a, Y = 1] = P[h(X) = 1 | Y = 1] \; \forall a\]This implementation of
UtilityParity
defines the event corresponding to Y=1.The prob_event
pandas.DataFrame
will record the fraction of the samples corresponding to Y = 1 in the Y array.The index MultiIndex will have a number of entries equal to the number of unique values of the sensitive feature, multiplied by the number of unique non-NaN values of the constructed event array, whose entries are either NaN or label=1 (so only one unique non-NaN value), multiplied by two (for the Lagrange multipliers for positive and negative constraints).
With these definitions, the
signed_weights()
method will calculate the costs for Y=1 as they are calculated in Example 4 of Agarwal et al. (2018) <https://arxiv.org/abs/1803.02453>, but will use the weights equal to zero for Y=0.- short_name = 'TruePositiveRateParity'¶
- class fairlearn.reductions.UtilityParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]¶
Bases:
fairlearn.reductions.ClassificationMoment
A generic moment for parity in utilities (or costs) under classification.
This serves as the base class for
DemographicParity
,EqualizedOdds
, and others. All subclasses can be used as difference-based constraints or ratio-based constraints. Refer to the user guide for more information and example usage.Constraints compare the group-level mean utility for each group with the overall mean utility (unless further events are specified, e.g., in equalized odds). Constraint violation for difference-based constraints starts if the difference between a group and the overall population with regard to a utility exceeds difference_bound. For ratio-based constraints, the ratio between the group-level and overal mean utility needs to be bounded between ratio_bound and its inverse (plus an additional additive ratio_bound_slack).
The index field is a
pandas.MultiIndex
corresponding to the constraint IDs. It is an index of various DataFrame and Series objects that are either required as arguments or returned by several of the methods of the UtilityParity class. It is the Cartesian product of:The unique events defining the particular moment object
The unique values of the sensitive feature
The characters + and -, corresponding to the Lagrange multipliers for positive and negative violations of the constraint
- Parameters
difference_bound (float) – The constraints’ difference bound for constraints that are expressed as differences, also referred to as \(\\epsilon\) in documentation. If ratio_bound is used then difference_bound needs to be None. If neither ratio_bound nor difference_bound are set then a default difference bound of 0.01 is used for backwards compatibility. Default None.
ratio_bound (float) – The constraints’ ratio bound for constraints that are expressed as ratios. The specified value needs to be in (0,1]. If difference_bound is used then ratio_bound needs to be None. Default None.
ratio_bound_slack (float) – The constraints’ ratio bound slack for constraints that are expressed as ratios, also referred to as \(\\epsilon\) in documentation. ratio_bound_slack is ignored if ratio_bound is not specified. Default 0.0
- bound()[source]¶
Return bound vector.
- Returns
a vector of bound values corresponding to all constraints
- Return type
- gamma(predictor)[source]¶
Calculate the degree to which constraints are currently violated by the predictor.
- load_data(X, y, event=None, utilities=None, **kwargs)[source]¶
Load the specified data into this object.
This adds a column event to the tags field.
The utilities is a 2-d array which correspond to g(X,A,Y,h(X)) as mentioned in the paper Agarwal et al. (2018) <https://arxiv.org/abs/1803.02453>. The utilities defaults to h(X), i.e. [0, 1] for each X_i. The first column is G^0 and the second is G^1. Assumes binary classification with labels 0/1. .. math:: utilities = [g(X,A,Y,h(X)=0), g(X,A,Y,h(X)=1)]
- project_lambda(lambda_vec)[source]¶
Return the projected lambda values.
i.e., returns lambda which is guaranteed to lead to the same or higher value of the Lagrangian compared with lambda_vec for all possible choices of the classifier, h.
- signed_weights(lambda_vec)[source]¶
Compute the signed weights.
Uses the equations for \(C_i^0\) and \(C_i^1\) as defined in Section 3.2 of Agarwal et al. (2018) in the ‘best response of the Q-player’ subsection to compute the signed weights to be applied to the data by the next call to the underlying estimator.
- Parameters
lambda_vec (
pandas.Series
) – The vector of Lagrange multipliers indexed by index
- class fairlearn.reductions.ZeroOneLoss[source]¶
Bases:
fairlearn.reductions.AbsoluteLoss
Class to evaluate a zero-one loss.