Fairness¶
- class xaiographs.Fairness(destination_path: str = './xaioweb_files', verbose: int = 0)¶
The Fairness class offers functionalities to explain how fair or unfair are the classifications made by a (Deep) Machine Learning model on a set of features that we consider sensitive (gender, ethnic group, religion, age, etc.).
Read more in the Fairness User Guide
- Parameters:
destination_path (str, default='./xaioweb_files') – Path where output XAIoWeb files will be stored.
verbose (int, default=0) –
Verbosity level.
Hint
Any value greater than 0 means verbosity is on.
Attributes:
Confusion matrix.
Correlation matrix (pearson correlation) between features.
DataFrame with the categories that are assigned to the Fairness criteria based on their score.
DataFrame with "Global Scores" (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}.
DataFrame with all information of fairness criteria.
DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9).
DataFrame with all information of Independence criterion.
DataFrame with all information of Separation criterion.
DataFrame with all information of Sufficiency criterion.
List with the different target values of dataset.
Methods:
fairness_metrics(df, sensitive_col, ...)Calculate scores for the criteria of
independence_score(),separation_score()andsufficiency_score().fit(df, sensitive_cols, target_col, predict_col)Main function that performs all the calculations of the Fairness class.
get_fairness_category(score)It assigns a Category to any Fairness criterion given a Score.
independence_score(df, sensitive_col, ...)Calculate independence criterion's score.
separation_score(df, sensitive_col, ...)Calculate separation criterion's score.
sufficiency_score(df, sensitive_col, ...)Calculate sufficiency criterion's score.
- property confusion_matrix¶
Confusion matrix.
- Returns:
confusion_matrix – Structure representing the Confusion Matrix.
- Return type:
pandas.DataFrame
- property correlation_matrix¶
Correlation matrix (pearson correlation) between features.
- Returns:
correlation_matrix – Structure representing the correlation matrix.
- Return type:
pandas.DataFrame
- property fairness_categories_score¶
DataFrame with the categories that are assigned to the Fairness criteria based on their score. The categories and ranges of scores are the following:
Category
Range Score
A+
0.0 <= score <= 0.02
A
0.02 < score <= 0.05
B
0.05 < score <= 0.08
C
0.08 < score <= 0.15
D
0.15 < score <= 0.25
E
0.25 < score <= 1.0
- Returns:
fairness_categories_score – Structure containing the categories based on scores.
- Return type:
pandas.DataFrame
- property fairness_global_info¶
DataFrame with “Global Scores” (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}. The DataFrame contains the following columns:
- Returns:
fairness_global_info – Structure with “Global Scores”.
- Return type:
pandas.DataFrame
- property fairness_info¶
DataFrame with all information of fairness criteria. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for each fairness criterion) its Score, its Category and its Weight (percentage of the value of the variable and the value target). The DataFrame contains the following columns:
Column
Description
sensitive_feature
Sensitive feature name.
sensitive_value
value of sensitive feature.
is_binary_sensitive_feature
indicates whether or not the sensitive feature is binary.
target_label
value of prediction (
y_predict).independence_score
Independence criterion score.
independence_category
Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.
independence_score_weight
Percentage (sensitive_value & predict_label)/all_rows_dataset
separation_score
Separation criterion score.
separation_category
Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.
separation_score_weight
Percentage (sensitive_value & predict_label)/all_rows_dataset.
sufficiency_score
Sufficiency criterion score.
sufficiency_category
Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.
sufficiency_score_weight
Percentage (sensitive_value & target_label)/all_rows_dataset.
- Returns:
fairness_info – Structure, with all information of fairness criteria.
- Return type:
pandas.DataFrame
- fairness_metrics(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) Tuple[float, float, float]¶
Calculate scores for the criteria of
independence_score(),separation_score()andsufficiency_score(). Being ‘A’ the sensitive feature, ‘Y’ the prediction and ‘T’ the real target, these criteria are calculated:- Parameters:
df (pd.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
target_col (str) – Name of the DataFrame (df) that contains target (ground truth or
y_real).predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or
y_real) of each element.sensitive_value (str) – Value of the sensitive feature for the score calculation.
- Returns:
fairness_metrics – Fairness score metrics (independence score, separation score, sufficiency score).
- Return type:
Tuple[float, float, float]
See also
For detailed calculations of each metrics, please refer to
independence_score(),separation_score()andsufficiency_score()from theFairnessclass.
- fit(df: DataFrame, sensitive_cols: List[str], target_col: str, predict_col: str) None¶
Main function that performs all the calculations of the Fairness class. The calculated results are accessible via the property functions of the class.
- Parameters:
df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_cols (List[str]) – List with the sensitive features (df column names) to evaluate the Fairness criteria.
target_col (str) – Column of DataFrame that contains target (ground truth or
y_real).predict_col (str) – Column of DataFrame that contains predictions (
y_predict) of each element.
- static get_fairness_category(score: float) str¶
It assigns a Category to any Fairness criterion given a Score. The relationship between Score and Category is shown in the following table:
Category
Range Score
A+
0.0 <= score <= 0.02
A
0.02 < score <= 0.05
B
0.05 < score <= 0.08
C
0.08 < score <= 0.15
D
0.15 < score <= 0.25
E
0.25 < score <= 1.0
- Parameters:
score (float) – Value of the score of the Fairness criterion.
- Returns:
category – Category assigned to the score.
- Return type:
str
- property highest_correlation_features¶
DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9). If one of these features is a sensitive features, it will be marked with a flag. In the event that there are no highly correlated features, an empty DataFrame will be returned.
- Returns:
highest_correlation_features – Structure containing the most highly correlated features.
- Return type:
pandas.DataFrame
- property independence_info¶
DataFrame with all information of Independence criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Independence criterion) its Score and its Category. The DataFrame contains the following columns:
Column
Description
sensitive_feature
Sensitive feature name.
sensitive_value
value of sensitive feature.
target_label
value of prediction (
y_predict).independence_score
Independence criterion score.
independence_category
Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.
- Returns:
independence_info – Structure with all information of Independence criterion.
- Return type:
pandas.DataFrame
- independence_score(df: DataFrame, sensitive_col: str, predict_col: str, target_label: str, sensitive_value: str) float¶
Calculate independence criterion’s score. We say that the random variables (Y, A) satisfy independence if the sensitive feature ‘A’ are statistically independent of the prediction ‘Y’. We define the score as the difference (in absolute value) of the probabilities:
\[independence\ score = | P(Y=y∣A=a) - P(Y=y∣A=b) |\]- Parameters:
df (pandas.DataFrame) –
- Structure containing the dataset to process. The dataset must have: N feature columns, a real target
column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or
y_real) of each element.sensitive_value (str) – Value of the sensitive feature for the score calculation.
- Returns:
independence_score – independence score value.
- Return type:
float
- Raises:
ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.
- property separation_info¶
DataFrame with all information of Separation criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Separation criterion) its Score and its Category. The DataFrame contains the following columns:
Column
Description
sensitive_feature
Sensitive feature name.
sensitive_value
value of sensitive feature.
target_label
value of prediction (
y_predict).separation_score
Separation criterion score.
separation_category
Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.
- Returns:
separation_info – Structure with all information of Separation criterion.
- Return type:
pandas.DataFrame
- separation_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) float¶
Calculate separation criterion’s score. We say the random variables (Y, A, T) satisfy separation if the sensitive characteristics ‘A’ are statistically independent of the prediction ‘Y’ given the target value ‘T’. We define the score as the difference (in absolute value) of the probabilities:
\[separation\ score = | P(Y=y∣T=t,A=a) - P(Y=y∣T=t,A=b) |\]- Parameters:
df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
target_col (str) – Name of the DataFrame (df) that contains target (ground truth or
y_real).predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or
y_real) of each element.sensitive_value (str) – Value of the sensitive feature for the score calculation.
- Returns:
separation_score – Separation score value.
- Return type:
float
- Raises:
ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.
- property sufficiency_info¶
DataFrame with all information of Sufficiency criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Sufficiency criterion) its Score and its Category. The DataFrame contains the following columns:
Column
Description
sensitive_feature
Sensitive feature name.
sensitive_value
value of sensitive feature.
target_label
value of prediction (
y_predict).sufficiency_score
Sufficiency criterion score.
sufficiency_category
Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.
- Returns:
sufficiency_info – Structure with all information of Sufficiency criterion.
- Return type:
pandas.DataFrame
- sufficiency_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) float¶
Calculate sufficiency criterion’s score. We say the random variables (Y,A,T) satisfy sufficiency if the sensitive characteristics ‘A’ are statistically independent of the target value ‘T’ given the prediction ‘Y’. We define the score as the difference (in absolute value) of the probabilities:
\[sufficiency\ score = | P(T=t∣Y=y,A=a) - P(T=t∣Y=y,A=b) |\]- Parameters:
df (pandas.DataFrame) – Structure with the dataset to process. The dataset must have: N feature columns, a real target column and prediction column.
sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.
target_col (str) – Name of the DataFrame (df) that contains target (ground truth or
y_real).predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.
target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or
y_real) of each element.sensitive_value (str) – Value of the sensitive feature for the score calculation.
- Returns:
sufficiency_score – Sufficiency score value.
- Return type:
float
- Raises:
ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.
- property target_values¶
List with the different target values of dataset.
- Returns:
target_values – List containing the target values.
- Return type:
List[str]