Fairness#

class xaiographs.Fairness(destination_path: str = './xaioweb_files', verbose: int = 0)#

The Fairness class offers functionalities to explain how fair or unfair are the classifications made by a (Deep) Machine Learning model on a set of features that we consider sensitive (gender, ethnic group, religion, age, etc.).

Read more in the Fairness User Guide

Parameters

destination_path (str, default='./xaioweb_files') – Path where output XAIoWeb files will be stored.
verbose (int, default=0) –
Verbosity level.

Hint

Any value greater than 0 means verbosity is on.

Attributes:

`confusion_matrix`	Confusion matrix.
`correlation_matrix`	Correlation matrix (pearson correlation) between features.
`fairness_categories_score`	DataFrame with the categories that are assigned to the Fairness criteria based on their score.
`fairness_global_info`	DataFrame with "Global Scores" (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}.
`fairness_info`	DataFrame with all information of fairness criteria.
`highest_correlation_features`	DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9).
`independence_info`	DataFrame with all information of Independence criterion.
`separation_info`	DataFrame with all information of Separation criterion.
`sufficiency_info`	DataFrame with all information of Sufficiency criterion.
`target_values`	List with the different target values of dataset.

Methods:

`fairness_metrics`(df, sensitive_col, ...)	Calculate scores for the criteria of `independence_score()`, `separation_score()` and `sufficiency_score()`.
`fit`(df, sensitive_cols, target_col, predict_col)	Main function that performs all the calculations of the Fairness class.
`get_fairness_category`(score)	It assigns a Category to any Fairness criterion given a Score.
`independence_score`(df, sensitive_col, ...)	Calculate independence criterion's score.
`separation_score`(df, sensitive_col, ...)	Calculate separation criterion's score.
`sufficiency_score`(df, sensitive_col, ...)	Calculate sufficiency criterion's score.

property confusion_matrix#

Confusion matrix.

Returns: confusion_matrix – Structure representing the Confusion Matrix.
Return type: pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property correlation_matrix#

Correlation matrix (pearson correlation) between features.

Returns: correlation_matrix – Structure representing the correlation matrix.
Return type: pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property fairness_categories_score#

DataFrame with the categories that are assigned to the Fairness criteria based on their score. The categories and ranges of scores are the following:

Category	Range Score
A+	0.0 <= score <= 0.02
A	0.02 < score <= 0.05
B	0.05 < score <= 0.08
C	0.08 < score <= 0.15
D	0.15 < score <= 0.25
E	0.25 < score <= 1.0

Returns: fairness_categories_score – Structure containing the categories based on scores.
Return type: pandas.DataFrame

property fairness_global_info#

DataFrame with “Global Scores” (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}. The DataFrame contains the following columns:

Returns: fairness_global_info – Structure with “Global Scores”.
Return type: pandas.DataFrame

property fairness_info#

DataFrame with all information of fairness criteria. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for each fairness criterion) its Score, its Category and its Weight (percentage of the value of the variable and the value target). The DataFrame contains the following columns:

Column	Description
sensitive_feature	Sensitive feature name.
sensitive_value	value of sensitive feature.
is_binary_sensitive_feature	indicates whether or not the sensitive feature is binary.
target_label	value of prediction (`y_predict`).
independence_score	Independence criterion score.
independence_category	Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.
independence_score_weight	Percentage (sensitive_value & predict_label)/all_rows_dataset
separation_score	Separation criterion score.
separation_category	Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.
separation_score_weight	Percentage (sensitive_value & predict_label)/all_rows_dataset.
sufficiency_score	Sufficiency criterion score.
sufficiency_category	Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.
sufficiency_score_weight	Percentage (sensitive_value & target_label)/all_rows_dataset.

Returns: fairness_info – Structure, with all information of fairness criteria.
Return type: pandas.DataFrame

fairness_metrics(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) → Tuple[float, float, float]#

Calculate scores for the criteria of independence_score(), separation_score() and sufficiency_score(). Being ‘A’ the sensitive feature, ‘Y’ the prediction and ‘T’ the real target, these criteria are calculated:

Parameters

df (pd.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).
predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.
sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

fairness_metrics – Fairness score metrics (independence score, separation score, sufficiency score).

Return type

Tuple[float, float, float]

See also

For detailed calculations of each metrics, please refer to independence_score(), separation_score() and sufficiency_score() from the Fairness class.

fit(df: DataFrame, sensitive_cols: List[str], target_col: str, predict_col: str) → None#

Main function that performs all the calculations of the Fairness class. The calculated results are accessible via the property functions of the class.

Parameters

df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_cols (List[str]) – List with the sensitive features (df column names) to evaluate the Fairness criteria.
target_col (str) – Column of DataFrame that contains target (ground truth or y_real).
predict_col (str) – Column of DataFrame that contains predictions (y_predict) of each element.

static get_fairness_category(score: float) → str#

It assigns a Category to any Fairness criterion given a Score. The relationship between Score and Category is shown in the following table:

Category	Range Score
A+	0.0 <= score <= 0.02
A	0.02 < score <= 0.05
B	0.05 < score <= 0.08
C	0.08 < score <= 0.15
D	0.15 < score <= 0.25
E	0.25 < score <= 1.0

Parameters: score (float) – Value of the score of the Fairness criterion.
Returns: category – Category assigned to the score.
Return type: str

property highest_correlation_features#

DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9). If one of these features is a sensitive features, it will be marked with a flag. In the event that there are no highly correlated features, an empty DataFrame will be returned.

Returns: highest_correlation_features – Structure containing the most highly correlated features.
Return type: pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property independence_info#

DataFrame with all information of Independence criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Independence criterion) its Score and its Category. The DataFrame contains the following columns:

Column	Description
sensitive_feature	Sensitive feature name.
sensitive_value	value of sensitive feature.
target_label	value of prediction (`y_predict`).
independence_score	Independence criterion score.
independence_category	Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.

Returns: independence_info – Structure with all information of Independence criterion.
Return type: pandas.DataFrame

independence_score(df: DataFrame, sensitive_col: str, predict_col: str, target_label: str, sensitive_value: str) → float#

Calculate independence criterion’s score. We say that the random variables (Y, A) satisfy independence if the sensitive feature ‘A’ are statistically independent of the prediction ‘Y’. We define the score as the difference (in absolute value) of the probabilities:

\[independence\ score = | P(Y=y∣A=a) - P(Y=y∣A=b) |\]

Parameters

df (pandas.DataFrame) – Structure containing the dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.
sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

independence_score – independence score value.

Return type

float

Raises

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property separation_info#

DataFrame with all information of Separation criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Separation criterion) its Score and its Category. The DataFrame contains the following columns:

Column	Description
sensitive_feature	Sensitive feature name.
sensitive_value	value of sensitive feature.
target_label	value of prediction (`y_predict`).
separation_score	Separation criterion score.
separation_category	Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.

Returns: separation_info – Structure with all information of Separation criterion.
Return type: pandas.DataFrame

separation_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) → float#

Calculate separation criterion’s score. We say the random variables (Y, A, T) satisfy separation if the sensitive characteristics ‘A’ are statistically independent of the prediction ‘Y’ given the target value ‘T’. We define the score as the difference (in absolute value) of the probabilities:

\[separation\ score = | P(Y=y∣T=t,A=a) - P(Y=y∣T=t,A=b) |\]

Parameters

df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).
predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.
sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

separation_score – Separation score value.

Return type

float

Raises

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property sufficiency_info#

DataFrame with all information of Sufficiency criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Sufficiency criterion) its Score and its Category. The DataFrame contains the following columns:

Column	Description
sensitive_feature	Sensitive feature name.
sensitive_value	value of sensitive feature.
target_label	value of prediction (`y_predict`).
sufficiency_score	Sufficiency criterion score.
sufficiency_category	Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.

Returns: sufficiency_info – Structure with all information of Sufficiency criterion.
Return type: pandas.DataFrame

sufficiency_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) → float#

Calculate sufficiency criterion’s score. We say the random variables (Y,A,T) satisfy sufficiency if the sensitive characteristics ‘A’ are statistically independent of the target value ‘T’ given the prediction ‘Y’. We define the score as the difference (in absolute value) of the probabilities:

\[sufficiency\ score = | P(T=t∣Y=y,A=a) - P(T=t∣Y=y,A=b) |\]

Parameters

df (pandas.DataFrame) – Structure with the dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).
predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.
sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

sufficiency_score – Sufficiency score value.

Return type

float

Raises

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property target_values#

List with the different target values of dataset.

Returns: target_values – List containing the target values.
Return type: List[str]

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

< 💎 API Reference