< πŸ’Ž API Reference

Fairness#

class xaiographs.Fairness(destination_path: str = './xaioweb_files', verbose: int = 0)#

The Fairness class offers functionalities to explain how fair or unfair are the classifications made by a (Deep) Machine Learning model on a set of features that we consider sensitive (gender, ethnic group, religion, age, etc.).

Read more in the Fairness User Guide

Parameters
  • destination_path (str, default='./xaioweb_files') – Path where output XAIoWeb files will be stored.

  • verbose (int, default=0) –

    Verbosity level.

    Hint

    Any value greater than 0 means verbosity is on.

Attributes:

confusion_matrix

Confusion matrix.

correlation_matrix

Correlation matrix (pearson correlation) between features.

fairness_categories_score

DataFrame with the categories that are assigned to the Fairness criteria based on their score.

fairness_global_info

DataFrame with "Global Scores" (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}.

fairness_info

DataFrame with all information of fairness criteria.

highest_correlation_features

DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9).

independence_info

DataFrame with all information of Independence criterion.

separation_info

DataFrame with all information of Separation criterion.

sufficiency_info

DataFrame with all information of Sufficiency criterion.

target_values

List with the different target values of dataset.

Methods:

fairness_metrics(df, sensitive_col, ...)

Calculate scores for the criteria of independence_score(), separation_score() and sufficiency_score().

fit(df, sensitive_cols, target_col, predict_col)

Main function that performs all the calculations of the Fairness class.

get_fairness_category(score)

It assigns a Category to any Fairness criterion given a Score.

independence_score(df, sensitive_col, ...)

Calculate independence criterion's score.

separation_score(df, sensitive_col, ...)

Calculate separation criterion's score.

sufficiency_score(df, sensitive_col, ...)

Calculate sufficiency criterion's score.

property confusion_matrix#

Confusion matrix.

Returns

confusion_matrix – Structure representing the Confusion Matrix.

Return type

pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property correlation_matrix#

Correlation matrix (pearson correlation) between features.

Returns

correlation_matrix – Structure representing the correlation matrix.

Return type

pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property fairness_categories_score#

DataFrame with the categories that are assigned to the Fairness criteria based on their score. The categories and ranges of scores are the following:

Category

Range Score

A+

0.0 <= score <= 0.02

A

0.02 < score <= 0.05

B

0.05 < score <= 0.08

C

0.08 < score <= 0.15

D

0.15 < score <= 0.25

E

0.25 < score <= 1.0

Returns

fairness_categories_score – Structure containing the categories based on scores.

Return type

pandas.DataFrame

property fairness_global_info#

DataFrame with β€œGlobal Scores” (weighted aggregation of the scores for each fairness criteria) for each sensitive feature and assigns it a category {A+, A, B, C, D, E}. The DataFrame contains the following columns:

Returns

fairness_global_info – Structure with β€œGlobal Scores”.

Return type

pandas.DataFrame

property fairness_info#

DataFrame with all information of fairness criteria. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for each fairness criterion) its Score, its Category and its Weight (percentage of the value of the variable and the value target). The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

is_binary_sensitive_feature

indicates whether or not the sensitive feature is binary.

target_label

value of prediction (y_predict).

independence_score

Independence criterion score.

independence_category

Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.

independence_score_weight

Percentage (sensitive_value & predict_label)/all_rows_dataset

separation_score

Separation criterion score.

separation_category

Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.

separation_score_weight

Percentage (sensitive_value & predict_label)/all_rows_dataset.

sufficiency_score

Sufficiency criterion score.

sufficiency_category

Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.

sufficiency_score_weight

Percentage (sensitive_value & target_label)/all_rows_dataset.

Returns

fairness_info – Structure, with all information of fairness criteria.

Return type

pandas.DataFrame

fairness_metrics(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) Tuple[float, float, float]#

Calculate scores for the criteria of independence_score(), separation_score() and sufficiency_score(). Being β€˜A’ the sensitive feature, β€˜Y’ the prediction and β€˜T’ the real target, these criteria are calculated:

Parameters
  • df (pd.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

fairness_metrics – Fairness score metrics (independence score, separation score, sufficiency score).

Return type

Tuple[float, float, float]

See also

For detailed calculations of each metrics, please refer to independence_score(), separation_score() and sufficiency_score() from the Fairness class.

fit(df: DataFrame, sensitive_cols: List[str], target_col: str, predict_col: str) None#

Main function that performs all the calculations of the Fairness class. The calculated results are accessible via the property functions of the class.

Parameters
  • df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_cols (List[str]) – List with the sensitive features (df column names) to evaluate the Fairness criteria.

  • target_col (str) – Column of DataFrame that contains target (ground truth or y_real).

  • predict_col (str) – Column of DataFrame that contains predictions (y_predict) of each element.

static get_fairness_category(score: float) str#

It assigns a Category to any Fairness criterion given a Score. The relationship between Score and Category is shown in the following table:

Category

Range Score

A+

0.0 <= score <= 0.02

A

0.02 < score <= 0.05

B

0.05 < score <= 0.08

C

0.08 < score <= 0.15

D

0.15 < score <= 0.25

E

0.25 < score <= 1.0

Parameters

score (float) – Value of the score of the Fairness criterion.

Returns

category – Category assigned to the score.

Return type

str

property highest_correlation_features#

DataFrame with pairs of features that have a pearson correlation value above a threshold (0.9). If one of these features is a sensitive features, it will be marked with a flag. In the event that there are no highly correlated features, an empty DataFrame will be returned.

Returns

highest_correlation_features – Structure containing the most highly correlated features.

Return type

pandas.DataFrame

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

property independence_info#

DataFrame with all information of Independence criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Independence criterion) its Score and its Category. The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

target_label

value of prediction (y_predict).

independence_score

Independence criterion score.

independence_category

Category {A+, A, B, C, D, E} assigned to the value of Independence criterion score.

Returns

independence_info – Structure with all information of Independence criterion.

Return type

pandas.DataFrame

independence_score(df: DataFrame, sensitive_col: str, predict_col: str, target_label: str, sensitive_value: str) float#

Calculate independence criterion’s score. We say that the random variables (Y, A) satisfy independence if the sensitive feature β€˜A’ are statistically independent of the prediction β€˜Y’. We define the score as the difference (in absolute value) of the probabilities:

\[independence\ score = | P(Y=y∣A=a) - P(Y=y∣A=b) |\]
Parameters
  • df (pandas.DataFrame) – Structure containing the dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

independence_score – independence score value.

Return type

float

Raises

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property separation_info#

DataFrame with all information of Separation criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Separation criterion) its Score and its Category. The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

target_label

value of prediction (y_predict).

separation_score

Separation criterion score.

separation_category

Category {A+, A, B, C, D, E} assigned to the value of Separation criterion score.

Returns

separation_info – Structure with all information of Separation criterion.

Return type

pandas.DataFrame

separation_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) float#

Calculate separation criterion’s score. We say the random variables (Y, A, T) satisfy separation if the sensitive characteristics β€˜A’ are statistically independent of the prediction β€˜Y’ given the target value β€˜T’. We define the score as the difference (in absolute value) of the probabilities:

\[separation\ score = | P(Y=y∣T=t,A=a) - P(Y=y∣T=t,A=b) |\]
Parameters
  • df (pandas.DataFrame) – Structure with dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

separation_score – Separation score value.

Return type

float

Raises

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property sufficiency_info#

DataFrame with all information of Sufficiency criterion. For each sensitive feature, for each value of the sensitive feature and for each value of the target, returns (for Sufficiency criterion) its Score and its Category. The DataFrame contains the following columns:

Column

Description

sensitive_feature

Sensitive feature name.

sensitive_value

value of sensitive feature.

target_label

value of prediction (y_predict).

sufficiency_score

Sufficiency criterion score.

sufficiency_category

Category {A+, A, B, C, D, E} assigned to the value of Sufficiency criterion score.

Returns

sufficiency_info – Structure with all information of Sufficiency criterion.

Return type

pandas.DataFrame

sufficiency_score(df: DataFrame, sensitive_col: str, target_col: str, predict_col: str, target_label: str, sensitive_value: str) float#

Calculate sufficiency criterion’s score. We say the random variables (Y,A,T) satisfy sufficiency if the sensitive characteristics β€˜A’ are statistically independent of the target value β€˜T’ given the prediction β€˜Y’. We define the score as the difference (in absolute value) of the probabilities:

\[sufficiency\ score = | P(T=t∣Y=y,A=a) - P(T=t∣Y=y,A=b) |\]
Parameters
  • df (pandas.DataFrame) – Structure with the dataset to process. The dataset must have: N feature columns, a real target column and prediction column.

  • sensitive_col (str) – Name of the DataFrame (df) column with the sensitive feature.

  • target_col (str) – Name of the DataFrame (df) that contains target (ground truth or y_real).

  • predict_col (str) – Name of the column of the DataFrame (df) that contains predictions of each element.

  • target_label (str) – Name of the DataFrame column (df) that contains target (ground truth or y_real) of each element.

  • sensitive_value (str) – Value of the sensitive feature for the score calculation.

Returns

sufficiency_score – Sufficiency score value.

Return type

float

Raises

ZeroDivisionError – One of the conditional probabilities equals zero, this leads to a division by zero.

property target_values#

List with the different target values of dataset.

Returns

target_values – List containing the target values.

Return type

List[str]

Caution

If the method fit() from the Fairness class has not been executed, it will return a warning message.

< πŸ’Ž API Reference