Titanic Example¶

In this example, we use the Titanic Dataset, which describes the survival status of individual Titanic passengers.

This dataset can be obtained using the load_titanic() function:

>>> from xaiographs.datasets import load_titanic
>>> df_dataset = load_titanic()
>>> df_dataset.head(3)
    id  gender title      age  family_size  is_alone embarked  class  ticket_price  survived
0    0  female   Mrs  29.0000            0         1        S      1      211.3375         1
1    1    male    Mr   0.9167            3         0        S      1      151.5500         1
2    2  female   Mrs   2.0000            3         0        S      1      151.5500         0

Note

Documentation of the characteristics of Titanic Dataset

Titanic Explainer¶

To determine the explainability of this dataset, we have discretized the continuous features (age, family_size and ticket_price) and produced as many columns representing the probability of the target as there are targets in the dataset.In this example, we will have two columns with probabilities of 0 and 1: SURVIVED and NO_SURVIVED. The function load_titanic_discretized() in XAIoGraphs already provides this altered dataset.

>>> from xaiographs.datasets import load_titanic_discretized
>>> df_dataset, features_cols, target_cols, _, _ = load_titanic_discretized()
>>> df_dataset[features_cols + target_cols].head(3)
   id  gender title          age family_size  is_alone embarked  class ticket_price  SURVIVED NO_SURVIVED 
0   0  female   Mrs  18_30_years           1         1        S      1         High         1           0 
1   1    male    Mr    <12_years         3-5         0        S      1         High         1           0 
2   2  female   Mrs    <12_years         3-5         0        S      1         High         0           1

Hint

This example demonstrates the true target (y_true). If you wanted to explain a Deep|Machine Learning model’s predictions, you would have to create the SURVIVED and NO_SURVIVED columns with the model prediction results (y_predict) rather than reality.

To obtain different explainability results, we must create an object of class Explainer and analyze it through the dataset and the explainability engine (LIDE). Later, by using the fit() method and handing it a list containing the names of the feature columns (feature_cols) and another list containing the names of the target columns (target_cols), it will execute all of the computations required to provide the explainability:

>>> explainer = Explainer(importance_engine='LIDE', verbose=1)
>>> explainer.fit(df=df_dataset, feature_cols=features_cols, target_cols=target_cols)

We retrieve the explainability results by accessing the various characteristics of the Explainer class. Let us now look at how we would calculate the relevance of the features:

>>> explainer.global_explainability
        feature  importance  rank
      gender    0.124936     1
       title    0.122790     2
       class    0.089931     3
ticket_price    0.062145     4
         age    0.059930     5
    embarked    0.059490     6
 family_size    0.042980     7
    is_alone    0.031692     8

The explainability information supplied by the rest of the class’s attributes may be found in the documentation for the Explainer class.

Titanic Reason Why¶

To compose a natural language phrase that describes Reason Why an element was categorized in a specific class, you’ll need (using the Why class):

Importance value of each Feature-Value for each dataset element
Template for Natural Language Sentences
Global and local semantics that assign a description to each Feature-Value

Let’s look at how to build the Reason Why for the following element:

   id  gender title          age family_size  is_alone embarked  class ticket_price  SURVIVED NO_SURVIVED 
0   0  female   Mrs  18_30_years           1         1        S      1         High         1           0

After running Explainer, we obtain the importance values of each element (in this case, only the first element):

>>> explainer.local_feature_value_explainability[explainer.local_feature_value_explainability.id==0].sort_values('rank')   
   id      feature_value  importance  rank
 0      gender_female    0.191029     1
 0          title_Mrs    0.189320     2
 0            class_1    0.147101     3
 0  ticket_price_High    0.101550     4
 0    age_18_30_years    0.008612     5
 0      family_size_1    0.004920     6
 0         is_alone_1    0.004920     6
 0         embarked_S   -0.027895     8

We define a sentence template that has three elements:
1. $target: It will be replaced by the class name to which the element belongs. (SURVIVED or NO_SURVIVED)
2. $temp_values_explain: It will be replaced by a semantics that describes each Feature-Value.
3. $temp_target_values_explain: It will be replaced with a semantics that explains each Feature-Value in relation to the target.

For $temp_values_explain, this case has been classified as $target, considering that $temp_target_values_explain.

Two semantics that associate a brief description with each Feature-Value must be defined. One of those semantics will be used to complete the template’s $temp_target_values_explain sentence component, while the other semantics will be used to complete the template’s $temp_values_explain sentence part.

Depending on the target, the first semantics assigns a descriptive word to each Feature-Value:

target	feature_value	reason
NO_SURVIVED	gender_male	many men have died
NO_SURVIVED	gender_female	to be a woman
NO_SURVIVED	is_alone_1	they traveled alone
…	…	…
NO_SURVIVED	age_18_30_years	they were young
NO_SURVIVED	class_1	few traveled in 1st class
NO_SURVIVED	ticket_price_High	they paid a lot for the ticket
NO_SURVIVED	embarked_S	many boarded at Southampton
…	…	…
SURVIVED	gender_female	many women survived
SURVIVED	is_alone_1	they traveled alone
SURVIVED	age_18_30_years	they were young
SURVIVED	class_1	many traveled in 1st class
SURVIVED	ticket_price_High	they paid a lot for the ticket
SURVIVED	embarked_S	few boarded at Southampton

Second, assign a descriptive phrase to each Feature-Value:

feature_value	reason
gender_male	to be a man
gender_female	to be a woman
is_alone_1	travel alone
…	..
age_18_30_years	be young
class_1	travel in 1st class
ticket_price_High	pay too much for the ticket
embarked_S	embark in a lower class town

Warning

Only a sampling of the Feature-Value that interest us for the example has been shown, not the entire semantics.

For each problem, these semantics must be defined. XAIoGraphs includes the templates and semantics for this dataset via load_titanic_why() function.

df_values_semantics, df_target_values_semantics = load_titanic_why(language='en')

We build the natural language sentences of the Reason Why using the class Why and calling the method fit() after executing the Explainer and with the templates and semantics defined (in padas.DataFrame):

why = Why(language=LANG,
          explainer=explainer,
          why_values_semantics=df_values_semantics,
          why_target_values_semantics=df_target_values_semantics,
          verbose=1)
why.fit()

We use the why_explanation property to get the Reason Why sentences for each of the items.

>>> why.why_explanation[why.why_explanation.id==0]
   id  reason
0   0  For to be a woman and travel in 1st class, this case has been classified as survived, considering that many women survived and many traveled in 1st class.

In this example, the Reason Why sentence is created in natural language using the template and semantics.

Hint

The function build_semantic_templates() of class Why returns two .csv files (values_semantics.csv and target_values_semantics.csv) containing the Feature-Value pairs required to complete the semantics.

Titanic Fairness¶

To obtain the Fairness Scores and Categories, a Dataset with the sensitive features discretized and two columns containing the goal (y_true) and prediction (y_predict) is required. The function load_titanic() in XAIoGraphs already provides this altered dataset:

>>> from xaiographs.datasets import load_titanic_discretized
>>> df_dataset, features_cols, _, y_true, y_predict = load_titanic_discretized()
>>> df_dataset[features_cols + [y_true] + [y_predict]].head(3)
   gender title          age family_size  is_alone embarked  class ticket_price       y_true    y_predict
0  female   Mrs  18_30_years           1         1        S      1         High     SURVIVED     SURVIVED
1    male    Mr    <12_years         3-5         0        S      1         High     SURVIVED     SURVIVED
2  female   Mrs    <12_years         3-5         0        S      1         High  NO_SURVIVED  NO_SURVIVED

Calling the method fit() and passing dataset (Pandas Dataframe), a list of sensitive features to be evaluated, and the names of the columns with the target and prediction as parameters, it will calculate the Independence, Separation, and Sufficiency fairness criteria.

>>> from xaiographs import Fairness
>>> f = Fairness()
>>> f.fit(df=df, 
...       sensitive_cols=['gender'], 
...       target_col='y_true', 
...       predict_col='y_predict')

XAIoWeb Titanic¶

After running the .fit() methods of each of the classes (one, two, or all three), a sequence of JSON files are generated in the xaioweb_files folder to visualized in XAIoWeb interface.

To launch the web (with the virtual environment enabled), run the following entry point:

>> xaioweb -d xaioweb_files -o

And the results seen in XAIoWeb are the following:

Global Explainability¶

Local Explainability¶

Fairness¶

< ✏️ Examples