< 💎 API Reference

Fairness

class xaiographs.Fairness(destination_path: str = './xaioweb_files', verbose: int = 0)

The Fairness class offers functionalities to explain how fair or unfair are the classifications made by a (Deep) Machine Learning model on a set of features that we consider sensitive (gender, ethnic group, religion, age, etc.).

Read more in the Fairness User Guide

Parameters:
  • destination_path (str, default='./xaioweb_files') – Path where output XAIoWeb files will be stored.

  • verbose (int, default=0) –

    Verbosity level.

    Hint

    Any value greater than 0 means verbosity is on.

Attributes:

confusion_matrix

Confusion matrix.

correlation_matrix

Correlation matrix (pearson correlation) between features.

fairness_categories_score

DataFrame with the categories that are assigned to the Fairness criteria based on their score.

fairness_global_info

DataFrame with "Global Scores" (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}.

fairness_info

DataFrame with all information of fairness criteria.

highest_correlation_features

DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9).

independence_info

DataFrame with all information of Independence criterion.

separation_info

DataFrame with all information of Separation criterion.

sufficiency_info

DataFrame with all information of Sufficiency criterion.

target_values

List with the different target values of dataset.

Methods:

fairness_metrics(df, sensitive_col, ...)

Calculate scores for the criteria of independence_score(), separation_score() and sufficiency_score().

fit(df, sensitive_cols, target_col, predict_col)

Main function that performs all the calculations of the Fairness class.

get_fairness_category(score)

It assigns a Category to any Fairness criterion given a Score.

independence_score(df, sensitive_col, ...)

Calculate independence criterion's score.

separation_score(df, sensitive_col, ...)

Calculate separation criterion's score.

sufficiency_score(df, sensitive_col, ...)

Calculate sufficiency criterion's score.

property confusion_matrix

Confusion matrix.

Returns:

confusion_matrix – Structure representing the Confusion Matrix.

Return type:

pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property correlation_matrix

Correlation matrix (pearson correlation) between features.

Returns:

correlation_matrix – Structure representing the correlation matrix.

Return type:

pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property fairness_categories_score

DataFrame with the categories that are assigned to the Fairness criteria based on their score. The categories and ranges of scores are the following:

Category

Range Score

A+

0.0 <= score <= 0.02

A

0.02 < score <= 0.05

B

0.05 < score <= 0.08

C

0.08 < score <= 0.15

D

0.15 < score <= 0.25

E

0.25 < score <= 1.0

Returns:

fairness_categories_score – Structure containing the categories based on scores.

Return type:

pandas.DataFrame

property fairness_global_info

DataFrame with “Global Scores” (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}. The DataFrame contains the following columns:

Returns:

fairness_global_info – Structure with “Global Scores”.

Return type:

pandas.DataFrame

property fairness_info

DataFrame with all information of fairness criteria. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for each fairness criterion) its Score, its Category and its Weight (percentage of the value of the variable and the value target). The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

is_binary_sensitive_feature

indicates whether or not the sensitive feature is binary.

target_label

value of prediction (y_predict).

independence_score

Independence criterion score.

independence_category

Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.

independence_score_weight

Percentage (sensitive_value & predict_label)/all_rows_dataset

separation_score

Separation criterion score.

separation_category

Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.

separation_score_weight

Percentage (sensitive_value & predict_label)/all_rows_dataset.

sufficiency_score

Sufficiency criterion score.

sufficiency_category

Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.

sufficiency_score_weight

Percentage (sensitive_value & target_label)/all_rows_dataset.

Returns:

fairness_info – Structure, with all information of fairness criteria.

Return type:

pandas.DataFrame

fairness_metrics(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) Tuple[float, float, float]

Calculate scores for the criteria of independence_score(), separation_score() and sufficiency_score(). Being ‘A’ the sensitive feature, ‘Y’ the prediction and ‘T’ the real target, these criteria are calculated:

Parameters:
  • df (pd.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns:

fairness_metrics – Fairness score metrics (independence score, separation score, sufficiency score).

Return type:

Tuple[float, float, float]

See also

For detailed calculations of each metrics, please refer to independence_score(), separation_score() and sufficiency_score() from the Fairness class.

fit(df: DataFrame, sensitive_cols: List[str], target_col: str, predict_col: str) None

Main function that performs all the calculations of the Fairness class. The calculated results are accessible via the property functions of the class.

Parameters:
  • df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_cols (List[str]) – List with the sensitive features (df column names) to evaluate the Fairness criteria.

  • target_col (str) – Column of DataFrame that contains target (ground truth or y_real).

  • predict_col (str) – Column of DataFrame that contains predictions (y_predict) of each element.

static get_fairness_category(score: float) str

It assigns a Category to any Fairness criterion given a Score. The relationship between Score and Category is shown in the following table:

Category

Range Score

A+

0.0 <= score <= 0.02

A

0.02 < score <= 0.05

B

0.05 < score <= 0.08

C

0.08 < score <= 0.15

D

0.15 < score <= 0.25

E

0.25 < score <= 1.0

Parameters:

score (float) – Value of the score of the Fairness criterion.

Returns:

category – Category assigned to the score.

Return type:

str

property highest_correlation_features

DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9). If one of these features is a sensitive features, it will be marked with a flag. In the event that there are no highly correlated features, an empty DataFrame will be returned.

Returns:

highest_correlation_features – Structure containing the most highly correlated features.

Return type:

pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property independence_info

DataFrame with all information of Independence criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Independence criterion) its Score and its Category. The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

target_label

value of prediction (y_predict).

independence_score

Independence criterion score.

independence_category

Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.

Returns:

independence_info – Structure with all information of Independence criterion.

Return type:

pandas.DataFrame

independence_score(df: DataFrame, sensitive_col: str, predict_col: str, target_label: str, sensitive_value: str) float

Calculate independence criterion’s score. We say that the random variables (Y, A) satisfy independence if the sensitive feature ‘A’ are statistically independent of the prediction ‘Y’. We define the score as the difference (in absolute value) of the probabilities:

\[independence\ score = | P(Y=y∣A=a) - P(Y=y∣A=b) |\]
Parameters:
  • df (pandas.DataFrame) –

    Structure containing the dataset to process. The dataset must have: N feature columns, a real target

    column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns:

independence_score – independence score value.

Return type:

float

Raises:

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property separation_info

DataFrame with all information of Separation criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Separation criterion) its Score and its Category. The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

target_label

value of prediction (y_predict).

separation_score

Separation criterion score.

separation_category

Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.

Returns:

separation_info – Structure with all information of Separation criterion.

Return type:

pandas.DataFrame

separation_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) float

Calculate separation criterion’s score. We say the random variables (Y, A, T) satisfy separation if the sensitive characteristics ‘A’ are statistically independent of the prediction ‘Y’ given the target value ‘T’. We define the score as the difference (in absolute value) of the probabilities:

\[separation\ score = | P(Y=y∣T=t,A=a) - P(Y=y∣T=t,A=b) |\]
Parameters:
  • df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns:

separation_score – Separation score value.

Return type:

float

Raises:

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property sufficiency_info

DataFrame with all information of Sufficiency criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Sufficiency criterion) its Score and its Category. The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

target_label

value of prediction (y_predict).

sufficiency_score

Sufficiency criterion score.

sufficiency_category

Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.

Returns:

sufficiency_info – Structure with all information of Sufficiency criterion.

Return type:

pandas.DataFrame

sufficiency_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) float

Calculate sufficiency criterion’s score. We say the random variables (Y,A,T) satisfy sufficiency if the sensitive characteristics ‘A’ are statistically independent of the target value ‘T’ given the prediction ‘Y’. We define the score as the difference (in absolute value) of the probabilities:

\[sufficiency\ score = | P(T=t∣Y=y,A=a) - P(T=t∣Y=y,A=b) |\]
Parameters:
  • df (pandas.DataFrame) – Structure with the dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns:

sufficiency_score – Sufficiency score value.

Return type:

float

Raises:

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property target_values

List with the different target values of dataset.

Returns:

target_values – List containing the target values.

Return type:

List[str]

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

< 💎 API Reference