2. Why¶
Why class provides the functionality of writing in natural language the reason why
for which an element has been classified in a specific class (target value) based on the local explainability results
offered by the Explainer class.
Requirements¶
To generate a natural language sentence that explains Reason Why a dataset element has been classified in a
certain class, you will need:
An object of the
Explainerclass, executed with the explainability of each element (local explainability).Semantics, including a description of each Feature-Value.
Template for Natural Language Sentences is optional.
Note
These templates are available in English and Spanish by default in XAIoGraphs.
Composing Reason Why¶
The “Reason Why” sentences for each element are created using a sequence of templates, such as the ones below:
Templates |
|---|
An explanation cannot be offered for this case. |
For |
For |
This case has been classified as |
The classification of this case as |
As |
$target parameter is replaced by the name of the target in which the element has been classified.
Let’s now see how to complete the $temp_values_explain and $temp_target_values_explain parameters.
When working with XAIoGraphs with discretized data, it is required to have an object of the
Explainer class that has the importance of its features determined for each
element; in particular, the importance of the Feature-Value.
For example, consider the following element (see Titanic example):
gender |
title |
age |
family_size |
is_alone |
embarked |
class |
ticket_price |
target |
|---|---|---|---|---|---|---|---|---|
female |
Mrs |
18_30_years |
1 |
1 |
S |
1 |
High |
SURVIVED |
The (ordered) importance of its Features-Values as provided by the Explainer
class is as follows:
feature_value |
importance |
rank |
|---|---|---|
gender_female |
0.191029 |
1 |
title_Mrs |
0.189320 |
2 |
class_1 |
0.147101 |
3 |
ticket_price_High |
0.101550 |
4 |
age_18_30_years |
0.008612 |
5 |
family_size_1 |
0.004920 |
6 |
is_alone_1 |
0.004920 |
6 |
embarked_S |
-0.027895 |
8 |
We will create the Reason Why sentence in natural language, expressing the element’s most essential ‘K’
Features-Values (gender_female, title_Mrs, class_1,…) using two semantics (values semantics and
target values semantics):
Values Semantics
Each Feature-Value (that you want to explain) will be allocated a sentence (semantics) that will be substituted in
the sentence template’s $temp_values_explain variable. As an example:
feature_value |
reason |
|---|---|
gender_male |
to be a man |
gender_female |
to be a woman |
is_alone_1 |
travel alone |
… |
… |
age_12_18_years |
be a teenager |
… |
… |
class_1 |
travel in 1st class |
… |
… |
ticket_price_High |
pay too much for the ticket |
… |
… |
embarked_S |
embark in a lower class town |
… |
… |
Using the following template as a reference:
For `$temp_values_explain`, this case has been classified as `$target`, considering that `$temp_target_values_explain`.
The parameter $temp_values_explain will be substituted by to be a woman and travel alone (taking the two most
important values into account), leaving the sentence:
For `to be a woman and travel alone`, this case has been classified as `SURVIVED`, considering that
`$temp_target_values_explain`.
The Values Semantics will be provided to the constructor of the Why class in
the why_values_semantics parameter as a pandas.DataFrame with two columns:
feature_value: Value of feature
reason: semantic of this feature-value
Target Values Semantics
For each Feature-Value (that you wish to explain), and depending on the element’s target, a phrase (semantics)
will be assigned and substituted in the sentence template’s $temp_target_values_explain variable. As an example:
target |
feature_value |
reason |
|---|---|---|
SURVIVED |
gender_male |
few men survived |
SURVIVED |
gender_female |
many women survived |
SURVIVED |
is_alone_1 |
they traveled alone |
… |
… |
… |
SURVIVED |
age_12_18_years |
they were teenagers |
… |
… |
… |
SURVIVED |
class_1 |
many traveled in 1st class |
… |
… |
… |
SURVIVED |
ticket_price_High |
they paid a lot for the ticket |
… |
… |
… |
SURVIVED |
embarked_S |
few boarded at Southampton |
… |
… |
… |
NO_SURVIVED |
gender_male |
many men have died |
NO_SURVIVED |
gender_female |
to be a woman |
NO_SURVIVED |
is_alone_1 |
they traveled alone |
Taking as reference the following template:
For `$temp_values_explain`, this case has been classified as `$target`, considering that `$temp_target_values_explain`.
And since the element is classified as SURVIVED, the parameter $temp_target_values_explain will be replaced by
many women survived and they traveled alone (taking into account the two most relevant values), leaving the sentence:
For `to be a man and travel alone`, this case has been classified as `SURVIVED`, considering that `many women
survived and they traveled alone`.
Target Values Semantics semantics will be passed to the constructor of the Why class in
the why_target_values_semantics parameter as a pandas.DataFrame with three columns:
target: target value
feature_value: Value of feature dependent of target
reason: semantic of this feature-value dependent of target
Note
In case some Feature-Value is not to be explained, it should not appear in the semantics.
(example: title_Mrs or title_Mr)
Number of features to explain
As mentioned before, the Reason Why is created based on the importance of the Features-Values of each element, with
the greatest value being chosen.
The sentence is created by adding the reason for each of the most essential Features-Values, depending on the
semantics (values semantics and target values semantics).
We must specify the number of Features-Values to include in the sentences ($temp_values_explain and
$temp_target_values_explain) in the n_values and n_target_values parameters of the
Why class constructor. These parameters are set to 2 by default.
Default Reason Why¶
XAIoGraph (in the Explainer class) assigns a local explainability reliability
score between 0 and 1, with 1 being very trustworthy and 0 being unreliable:
>>> explainer.local_reliability.head(10)
id target reliability
0 0 SURVIVED 1.00
1 1 SURVIVED 1.00
2 2 NO_SURVIVED 1.00
3 3 NO_SURVIVED 1.00
4 4 NO_SURVIVED 0.20
5 5 SURVIVED 0.28
6 6 SURVIVED 0.75
7 7 NO_SURVIVED 0.86
8 8 SURVIVED 1.00
9 9 NO_SURVIVED 1.00
Why class implementation allows passing as a parameter (min_reliability) a
reliability threshold from which it will build the Reason Why sentences.
If the reliability value is below than this threshold, the Reason Why will use a generic statement as the first
sentence of the pandas.DataFrame provided as a parameter (why_templates) to the Why
class constructor:
Templates |
|---|
An explanation cannot be offered for this case. |
For |
… |
In this example, we assign a default sentence to elements with a reliability of less than or equal to 0.3:
>>> from xaiographs import Explainer
>>> from xaiographs import Why
>>> from xaiographs.datasets import load_titanic_discretized, load_titanic_why
>>>
>>> LANG = 'en'
>>>
>>> example_dataset, feature_cols, target_cols, y_true, y_predict = load_titanic_discretized()
>>> df_values_semantics, df_target_values_semantics = load_titanic_why(language=LANG)
>>>
>>> explainer = Explainer(importance_engine='LIDE', verbose=0)
>>> explainer.fit(df=example_dataset, feature_cols=feature_cols, target_cols=target_cols)
>>> explainer.local_reliability.head(6)
id target reliability
0 0 SURVIVED 1.00
1 1 SURVIVED 1.00
2 2 NO_SURVIVED 1.00
3 3 NO_SURVIVED 1.00
4 4 NO_SURVIVED 0.20
5 5 SURVIVED 0.28
>>>
>>> why = Why(language=LANG,
... explainer=explainer,
... why_values_semantics=df_values_semantics,
... why_target_values_semantics=df_target_values_semantics,
... min_reliability=0.3,
... verbose=0)
>>> why.fit()
>>> why.why_explanation.head(6)
id reason
0 0 The classification of this case as survived is due to to be a woman and travel in 1st class, because many women survived and many traveled in 1st class.
1 1 The classification of this case as survived is due to travel in 1st class and be a child, because many traveled in 1st class and they were children.
2 2 For travel in 1st class and be a child, this case has been classified as no_survived, considering that few traveled in 1st class and they were children.
3 3 This case has been classified as no_survived because be young and to be a man, due to they were young and many men have died.
4 4 An explanation cannot be offered for this case.
5 5 An explanation cannot be offered for this case.
Example¶
To create a Reason Why, we must first assess the importance of each feature in the dataset items provided by the
Explainer class. Let’s look at an example using the
titanic dataset:
>>> from xaiographs import Explainer
>>> from xaiographs.datasets import load_titanic_discretized
>>>
>>> example_dataset, feature_cols, target_cols, _, _ = load_titanic_discretized()
>>>
>>> explainer = Explainer(importance_engine='LIDE', verbose=0)
>>> explainer.fit(df=example_dataset, feature_cols=feature_cols, target_cols=target_cols)
After executing the explainability of the dataset, the Explainer class
returns the local explainability of each element, assigning an importance value to each of its Features
(local_feature_value_explainability property).
>>> explainer.local_feature_value_explainability[explainer.local_feature_value_explainability['rank'] <= 3]
id feature_value importance rank
0 0 gender_female 0.191029 1
1 0 title_Mrs 0.189320 2
2 0 class_1 0.147101 3
10 1 class_1 0.386458 1
15 1 age_<12_years 0.311776 2
14 1 is_alone_0 0.029181 3
23 2 age_<12_years 0.340105 1
18 2 class_1 0.096683 2
20 2 embarked_S 0.037918 3
31 3 age_18_30_years 0.194554 1
25 3 title_Mr 0.153689 2
24 3 gender_male 0.151856 3
36 4 embarked_S 0.144169 1
37 4 family_size_3-5 0.126038 2
38 4 is_alone_0 0.096112 3
42 5 class_1 0.191949 1
43 5 ticket_price_High 0.172900 2
47 5 age_30_60_years 0.090899 3
48 6 gender_female 0.245250 1
49 6 title_Mrs 0.244135 2
50 6 class_1 0.112574 3
...
And a reliability value for each of the explanations
(local_reliability property):
>>> explainer.local_reliability
id target reliability
0 0 SURVIVED 1.00
1 1 SURVIVED 1.00
2 2 NO_SURVIVED 1.00
3 3 NO_SURVIVED 1.00
4 4 NO_SURVIVED 0.20
5 5 SURVIVED 0.28
...
We need to build a pandas.DataFrame in order to create natural language sentences, containing a succession of
sentence templates. The first line will always be written by default for those local explicabilities with reliability
less than a particular threshold, as defined by the min_reliability option of the Why
class.
The remaining sentence templates (one or more sentences) will be written in natural language, with the parameters
$target, $temp_values_explain and $temp_target_values_explain replaced by the target value and the semantics
associated with each one (see Composing Reason Why). If more than one sentence
template is provided, a sentence will be randomly selected to build the Reason Why of each element.
The following is an example of how to construct a pandas.DataFrame of sentence templates:
>>> import pandas as pd
>>> templates=['An explanation cannot be offered for this case.',
... 'For $temp_values_explain, this case has been classified as $target, considering that $temp_target_values_explain.',
... 'As $temp_target_values_explain, and this case is characterized by $temp_values_explain, has been classified as $target.']
>>> df_why_templates = pd.DataFrame(templates)
Note
These templates are available in English and Spanish by default in XAIoGraphs.
We require the two semantics given in the Composing Reason Why section to compose
the sentences. XAIoGraphs supports these semantics for the Titanic via the
load_titanic_why() method:
>>> from xaiographs.datasets import load_titanic_why
>>> df_values_semantics, df_target_values_semantics = load_titanic_why(language='en')
>>> df_values_semantics
feature_value reason
0 gender_male to be a man
1 gender_female to be a woman
2 is_alone_1 travel alone
3 family_size_2 to be from a family of few members
4 family_size_3-5 be a large family
5 family_size_>5 be a family with many members
6 age_<12_years be a child
7 age_12_18_years be a teenager
8 age_18_30_years be young
9 age_30_60_years be an adult
10 age_>60_years be an older person
11 class_1 travel in 1st class
12 class_2 travel in 2nd class
13 class_3 travel in 3rd class
14 ticket_price_High pay too much for the ticket
15 ticket_price_Mid pay for a mid-cost ticket
16 ticket_price_Low pay little for the ticket
17 embarked_S embark in a lower class town
18 embarked_Q boarding in a middle class town
19 embarked_C boarding in a high class town
>>> df_target_values_semantics
target feature_value reason
0 NO_SURVIVED gender_male many men have died
1 NO_SURVIVED gender_female to be a woman
2 NO_SURVIVED is_alone_1 they traveled alone
3 NO_SURVIVED family_size_2 they were from a family of few members
4 NO_SURVIVED family_size_3-5 they were from a large family
5 NO_SURVIVED family_size_>5 they were from a family of many members
6 NO_SURVIVED age_<12_years they were children
7 NO_SURVIVED age_12_18_years they were teenagers
8 NO_SURVIVED age_18_30_years they were young
9 NO_SURVIVED age_30_60_years they were adults
10 NO_SURVIVED age_>60_years they were older
11 NO_SURVIVED class_1 few traveled in 1st class
12 NO_SURVIVED class_2 some traveled in 2nd class
13 NO_SURVIVED class_3 many traveled in 3rd class
14 NO_SURVIVED ticket_price_High they paid a lot for the ticket
15 NO_SURVIVED ticket_price_Mid they paid a medium cost ticket
16 NO_SURVIVED ticket_price_Low they paid little for the ticket
17 NO_SURVIVED embarked_C few embarked in Cherbourg
18 NO_SURVIVED embarked_Q some embarked in Queenstown
19 NO_SURVIVED embarked_S many boarded at Southampton
20 SURVIVED gender_male few men survived
21 SURVIVED gender_female many women survived
22 SURVIVED is_alone_1 they traveled alone
23 SURVIVED family_size_2 they were from a family of few members
24 SURVIVED family_size_3-5 they were from a large family
25 SURVIVED family_size_>5 they were from a family of many members
26 SURVIVED age_<12_years they were children
27 SURVIVED age_12_18_years they were teenagers
28 SURVIVED age_18_30_years they were young
29 SURVIVED age_30_60_years they were adults
30 SURVIVED age_>60_years they were older
31 SURVIVED class_1 many traveled in 1st class
32 SURVIVED class_2 some traveled in 2nd class
33 SURVIVED class_3 few traveled in 3rd class
34 SURVIVED ticket_price_High they paid a lot for the ticket
35 SURVIVED ticket_price_Mid they paid a medium cost ticket
36 SURVIVED ticket_price_Low they paid little for the ticket
37 SURVIVED embarked_C many embarked in Cherbourg
38 SURVIVED embarked_Q some embarked in Queenstown
39 SURVIVED embarked_S few boarded at Southampton
With all of this information, we are able to create the Reason Why as follows, explaining two features and
defining a reliability threshold greater than 0.3:
>>> from xaiographs import Why
>>> why = Why(language='en',
... explainer=explainer,
... why_values_semantics=df_values_semantics,
... why_target_values_semantics=df_target_values_semantics,
... why_templates=df_why_templates,
... n_values=2,
... n_target_values=2,
... min_reliability=0.3,
... verbose=0)
>>> why.fit()
>>> why.why_explanation.head(6)
id reason
0 0 As many women survived and many traveled in 1st class, and this case is characterized by to be a woman and travel in 1st class, has been classified as survived.
1 1 For travel in 1st class and be a child, this case has been classified as survived, considering that many traveled in 1st class and they were children.
2 2 As few traveled in 1st class and they were children, and this case is characterized by travel in 1st class and be a child, has been classified as no_survived.
3 3 For be young and to be a man, this case has been classified as no_survived, considering that they were young and many men have died.
4 4 An explanation cannot be offered for this case.
5 5 An explanation cannot be offered for this case.
Hint
XAIoGraphs includes the build_semantic_templates()
function, which returns two .csv files (values_semantics.csv and target_values_semantics.csv) with the
Features-Values and target - Features-Values of each semantic:
>>> from xaiographs import Explainer
>>> from xaiographs import Why
>>> from xaiographs.datasets import load_titanic_discretized
>>> example_dataset, feature_cols, target_cols, _, _ = load_titanic_discretized()
>>> explainer = Explainer(importance_engine='LIDE', verbose=0)
>>> explainer.fit(df=example_dataset, feature_cols=feature_cols, target_cols=target_cols)
>>> Why.build_semantic_templates(explainer=explainer, destination_template_path='./')