< 📚 User Guide

Datasets#

To test the capabilities of XAIoGraphs, it provides a series of datasets via xaiographs.datasets module’s features.

The following datasets are included:

Datset

Rows

Num. Feats

Task

Titanic

1309

8

Binary

Compas

4230

7

Multi-Class (3)

Compas Reality

4230

7

Binary

Body Performace

13393

11

Multi-Class (3)

Education Performance

145

29

Multi-Class (5)

These datasets are accessible in both raw and discretized form, ready for usage by the Explainability and Fairness classes.

The details of these Datasets are shown below:

Note

The original Datasets have been treated to remove outlayers, impute null values, and so on.

 

Titanic#

The supposedly “unsinkable” RMS Titanic sank on April 15, 1912, during her first voyage after hitting an iceberg. Unfortunately, there were not enough lifeboats to accommodate everyone, and 1502 out of 2224 passengers and staff perished.Individual Titanic passengers’ chances of survival are described in the famous Titanic Dataset.

Source

https://www.kaggle.com/c/titanic

Num Rows:

1309

Num Features

8

Num Targets:

2

function to obtain dataset

xaiographs.datasets.load_titanic()

function to obtain discretized dataset

xaiographs.datasets.load_titanic_discretized()

 

Compas#

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a popular commercial algorithm used by judges and parole officers for scoring criminal defendant’s likelihood of reoffending (recidivism). It has been shown that the algorithm is biased in favor of white defendants, and against black inmates, based on a 2 year follow up study (i.e who actually committed crimes or violent crimes after 2 years). The pattern of mistakes, as measured by precision/sensitivity is notable.

Source

https://github.com/propublica/compas-analysis

Num Rows:

4230

Num Features

7

Num Targets:

3 (model) - 2 (reality)

function to obtain dataset

xaiographs.datasets.load_compas()

function to obtain discretized dataset (Model)

xaiographs.datasets.load_compas_discretized()

function to obtain discretized dataset (Reality)

xaiographs.datasets.load_compas_reality_discretized()

 

Body Performance#

This dataset demonstrates how performance levels change with age and some exercise-related variables.

Source

https://www.kaggle.com/datasets/kukuroo3/body-performance-data

Num Rows:

13393

Num Features

11

Num Targets:

3

function to obtain dataset

xaiographs.datasets.load_body_performance()

function to obtain discretized dataset

xaiographs.datasets.load_body_performance_discretized()

 

Education Performance#

The data was collected from the Faculty of Engineering and Faculty of Educational Sciences students in 2019. The purpose is to predict students’ end-of-term performances using ML techniques.

< 📚 User Guide