Open In Colab View Notebook on GitHub

Wage classification [scikit-learn]ΒΆ

Giskard is an open-source framework for testing all ML models, from LLMs to tabular models. Don’t hesitate to give the project a star on GitHub ⭐️ if you find it useful!

In this notebook, you’ll learn how to create comprehensive test suites for your model in a few lines of code, thanks to Giskard’s open-source Python library.

Use-case:

  • Binary classification to predict whether a person makes over 50K a year or not given their demographic variation.

  • Reference notebook

  • Dataset

Outline:

  • Detect vulnerabilities automatically with Giskard’s scan

  • Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics

Install dependenciesΒΆ

Make sure to install the giskard

[ ]:
%pip install giskard --upgrade

Import librariesΒΆ

[1]:
from pathlib import Path
from urllib.request import urlretrieve

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder

from giskard import Model, Dataset, scan, testing

Define constantsΒΆ

[ ]:
# Constants
RANDOM_SEED = 0
TEST_RATIO = 0.2

DROP_FEATURES = ["education", "native-country", "occupation", "marital-status", "educational-num"]

CATEGORICAL_FEATURES = ["workclass", "relationship", "race", "gender"]

NUMERICAL_FEATURES = [
    "age",
    "fnlwgt",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
]

TARGET_COLUMN = "income"

# Paths.
DATA_URL = (
    "https://giskard-library-test-datasets.s3.eu-north-1.amazonaws.com/wage_classification_dataset-adult.csv.tar.gz"
)
DATA_PATH = Path.home() / ".giskard" / "wage_classification_dataset" / "adult.csv.tar.gz"

Dataset preparationΒΆ

Load and preprocess dataΒΆ

[ ]:
def fetch_demo_data(url: str, file: Path) -> None:
    """Helper to fetch data from the FTP server."""
    if not file.parent.exists():
        file.parent.mkdir(parents=True, exist_ok=True)

    if not file.exists():
        print(f"Downloading data from {url}")
        urlretrieve(url, file)

    print(f"Data was loaded!")


def download_data(**kwargs) -> pd.DataFrame:
    """Download the dataset using URL."""
    fetch_demo_data(DATA_URL, DATA_PATH)
    _df = pd.read_csv(DATA_PATH, **kwargs)
    return _df


def preprocess_data(df: pd.DataFrame) -> pd.DataFrame:
    # Drop NaNs and columns.
    df = df.dropna()
    df = df.drop(columns=DROP_FEATURES)
    return df
[ ]:
income_df = download_data()
income_df = preprocess_data(income_df)

Train-test splitΒΆ

[5]:
X_train, X_test, y_train, y_test = train_test_split(
    income_df.drop(columns=TARGET_COLUMN), income_df[TARGET_COLUMN], test_size=TEST_RATIO, random_state=RANDOM_SEED
)

Wrap dataset with GiskardΒΆ

To prepare for the vulnerability scan, make sure to wrap your dataset using Giskard’s Dataset class. More details here.

[ ]:
raw_data = pd.concat([X_test, y_test], axis=1)
giskard_dataset = Dataset(
    df=raw_data,
    # A pandas.DataFrame that contains the raw data (before all the pre-processing steps) and the actual ground truth variable (target).
    target=TARGET_COLUMN,  # Ground truth variable.
    name="salary_data",  # Optional.
    cat_columns=CATEGORICAL_FEATURES,
    # List of categorical columns. Optional, but is a MUST if available. Inferred automatically if not.
)

Model buildingΒΆ

Define preprocessing pipelineΒΆ

[7]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), NUMERICAL_FEATURES),
        ("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), CATEGORICAL_FEATURES),
    ]
)

Build estimatorΒΆ

[ ]:
pipeline = Pipeline(steps=[("preprocessor", preprocessor), ("classifier", RandomForestClassifier())])

pipeline.fit(X_train, y_train)

# Accuracy score.
train_metric = pipeline.score(X_train, y_train)
test_metric = pipeline.score(X_test, y_test)

print(f"Train accuracy: {train_metric:.2f}")
print(f"Test accuracy: {test_metric:.2f}")

Wrap model with GiskardΒΆ

To prepare for the vulnerability scan, make sure to wrap your model using Giskard’s Model class. You can choose to either wrap the prediction function (preferred option) or the model object. More details here.

[ ]:
giskard_model = Model(
    model=pipeline,
    # A prediction function that encapsulates all the data pre-processing steps and that could be executed with the dataset used by the scan.
    model_type="classification",  # Either regression, classification or text_generation.
    name="salary_cls",  # Optional.
    classification_labels=pipeline.classes_,  # Their order MUST be identical to the prediction_function's output order.
    feature_names=X_train.columns,  # Default: all columns of your dataset.
)

# Validate wrapped model.
wrapped_predict = giskard_model.predict(giskard_dataset)
wrapped_test_metric = accuracy_score(y_test, wrapped_predict.prediction)

print(f"Wrapped Test accuracy: {wrapped_test_metric:.2f}")

Detect vulnerabilities in your modelΒΆ

Scan your model for vulnerabilities with GiskardΒΆ

Giskard’s scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our scan documentation.

[ ]:
results = scan(giskard_model, giskard_dataset)
[11]:
display(results)

Generate comprehensive test suites automatically for your modelΒΆ

Generate test suites from the scanΒΆ

The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model’s performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates.

[12]:
test_suite = results.generate_test_suite("My first test suite")
test_suite.run()
2024-05-29 14:14:33,648 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,654 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (6902, 10) executed in 0:00:00.041066
Executed 'Overconfidence on data slice β€œ`hours-per-week` < 41.500”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b723070>, 'threshold': 0.4973034997131383, 'p_threshold': 0.5}:
               Test failed
               Metric: 0.5


2024-05-29 14:14:33,676 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,684 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (879, 10) executed in 0:00:00.017064
Executed 'Underconfidence on data slice β€œ`age` >= 41.500 AND `age` < 45.500”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6c2b00>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}:
               Test failed
               Metric: 0.02


2024-05-29 14:14:33,714 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,717 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (3923, 10) executed in 0:00:00.018021
Executed 'Underconfidence on data slice β€œ`relationship` == "Husband"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6c9480>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}:
               Test failed
               Metric: 0.02


2024-05-29 14:14:33,742 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,745 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (1335, 10) executed in 0:00:00.017211
Executed 'Underconfidence on data slice β€œ`age` >= 48.500 AND `age` < 58.500”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6c2d40>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}:
               Test failed
               Metric: 0.02


2024-05-29 14:14:33,772 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,776 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (6485, 10) executed in 0:00:00.022294
Executed 'Underconfidence on data slice β€œ`gender` == "Male"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6ca350>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}:
               Test failed
               Metric: 0.01


2024-05-29 14:14:33,790 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,792 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (1498, 10) executed in 0:00:00.010130
Executed 'Recall on data slice β€œ`relationship` == "Own-child"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66d180>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.3


2024-05-29 14:14:33,807 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,809 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (576, 10) executed in 0:00:00.007495
Executed 'Recall on data slice β€œ`workclass` == "?"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66f0a0>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.34


2024-05-29 14:14:33,827 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,829 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (2528, 10) executed in 0:00:00.013633
Executed 'Recall on data slice β€œ`relationship` == "Not-in-family"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66ca90>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.35


2024-05-29 14:14:33,849 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,850 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (1076, 10) executed in 0:00:00.008265
Executed 'Recall on data slice β€œ`relationship` == "Unmarried"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66ee90>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.38


2024-05-29 14:14:33,865 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,867 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (935, 10) executed in 0:00:00.008937
Executed 'Recall on data slice β€œ`race` == "Black"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b7ef3d0>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.38


2024-05-29 14:14:33,880 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,882 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (721, 10) executed in 0:00:00.008114
Executed 'Recall on data slice β€œ`workclass` == "Self-emp-not-inc"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b721120>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.39


2024-05-29 14:14:33,902 pid:72955 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}
2024-05-29 14:14:33,905 pid:72955 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (3284, 10) executed in 0:00:00.014940
Executed 'Recall on data slice β€œ`gender` == "Female"”' with arguments {'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b7ef040>, 'threshold': 0.5253512132822478}:
               Test failed
               Metric: 0.52


2024-05-29 14:14:33,917 pid:72955 MainThread giskard.core.suite INFO     Executed test suite 'My first test suite'
2024-05-29 14:14:33,917 pid:72955 MainThread giskard.core.suite INFO     result: failed
2024-05-29 14:14:33,917 pid:72955 MainThread giskard.core.suite INFO     Overconfidence on data slice β€œ`hours-per-week` < 41.500” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b723070>, 'threshold': 0.4973034997131383, 'p_threshold': 0.5}): {failed, metric=0.5041237113402062}
2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO     Underconfidence on data slice β€œ`age` >= 41.500 AND `age` < 45.500” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6c2b00>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.023890784982935155}
2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO     Underconfidence on data slice β€œ`relationship` == "Husband"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6c9480>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.02013764975783839}
2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO     Underconfidence on data slice β€œ`age` >= 48.500 AND `age` < 58.500” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6c2d40>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.01647940074906367}
2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO     Underconfidence on data slice β€œ`gender` == "Male"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b6ca350>, 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.013415574402467233}
2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`relationship` == "Own-child"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66d180>, 'threshold': 0.5253512132822478}): {failed, metric=0.2962962962962963}
2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`workclass` == "?"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66f0a0>, 'threshold': 0.5253512132822478}): {failed, metric=0.3448275862068966}
2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`relationship` == "Not-in-family"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66ca90>, 'threshold': 0.5253512132822478}): {failed, metric=0.3490909090909091}
2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`relationship` == "Unmarried"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b66ee90>, 'threshold': 0.5253512132822478}): {failed, metric=0.38095238095238093}
2024-05-29 14:14:33,920 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`race` == "Black"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b7ef3d0>, 'threshold': 0.5253512132822478}): {failed, metric=0.38333333333333336}
2024-05-29 14:14:33,920 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`workclass` == "Self-emp-not-inc"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b721120>, 'threshold': 0.5253512132822478}): {failed, metric=0.391304347826087}
2024-05-29 14:14:33,920 pid:72955 MainThread giskard.core.suite INFO     Recall on data slice β€œ`gender` == "Female"” ({'model': <giskard.models.sklearn.SKLearnModel object at 0x17c037f40>, 'dataset': <giskard.datasets.base.Dataset object at 0x17afa3310>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x32b7ef040>, 'threshold': 0.5253512132822478}): {failed, metric=0.5231607629427792}
[12]:
close Test suite failed.
Test Overconfidence on data slice β€œ`hours-per-week` < 41.500”
Measured Metric = 0.50412 close Failed
model salary_cls
dataset salary_data
slicing_function `hours-per-week` < 41.500
threshold 0.4973034997131383
p_threshold 0.5
Test Underconfidence on data slice β€œ`age` >= 41.500 AND `age` < 45.500”
Measured Metric = 0.02389 close Failed
model salary_cls
dataset salary_data
slicing_function `age` >= 41.500 AND `age` < 45.500
threshold 0.011710512846760161
p_threshold 0.95
Test Underconfidence on data slice β€œ`relationship` == "Husband"”
Measured Metric = 0.02014 close Failed
model salary_cls
dataset salary_data
slicing_function `relationship` == "Husband"
threshold 0.011710512846760161
p_threshold 0.95
Test Underconfidence on data slice β€œ`age` >= 48.500 AND `age` < 58.500”
Measured Metric = 0.01648 close Failed
model salary_cls
dataset salary_data
slicing_function `age` >= 48.500 AND `age` < 58.500
threshold 0.011710512846760161
p_threshold 0.95
Test Underconfidence on data slice β€œ`gender` == "Male"”
Measured Metric = 0.01342 close Failed
model salary_cls
dataset salary_data
slicing_function `gender` == "Male"
threshold 0.011710512846760161
p_threshold 0.95
Test Recall on data slice β€œ`relationship` == "Own-child"”
Measured Metric = 0.2963 close Failed
model salary_cls
dataset salary_data
slicing_function `relationship` == "Own-child"
threshold 0.5253512132822478
Test Recall on data slice β€œ`workclass` == "?"”
Measured Metric = 0.34483 close Failed
model salary_cls
dataset salary_data
slicing_function `workclass` == "?"
threshold 0.5253512132822478
Test Recall on data slice β€œ`relationship` == "Not-in-family"”
Measured Metric = 0.34909 close Failed
model salary_cls
dataset salary_data
slicing_function `relationship` == "Not-in-family"
threshold 0.5253512132822478
Test Recall on data slice β€œ`relationship` == "Unmarried"”
Measured Metric = 0.38095 close Failed
model salary_cls
dataset salary_data
slicing_function `relationship` == "Unmarried"
threshold 0.5253512132822478
Test Recall on data slice β€œ`race` == "Black"”
Measured Metric = 0.38333 close Failed
model salary_cls
dataset salary_data
slicing_function `race` == "Black"
threshold 0.5253512132822478
Test Recall on data slice β€œ`workclass` == "Self-emp-not-inc"”
Measured Metric = 0.3913 close Failed
model salary_cls
dataset salary_data
slicing_function `workclass` == "Self-emp-not-inc"
threshold 0.5253512132822478
Test Recall on data slice β€œ`gender` == "Female"”
Measured Metric = 0.52316 close Failed
model salary_cls
dataset salary_data
slicing_function `gender` == "Female"
threshold 0.5253512132822478

Customize your suite by loading objects from the Giskard catalogΒΆ

The Giskard open source catalog will enable to load:

  • Tests such as metamorphic, performance, prediction & data drift, statistical tests, etc

  • Slicing functions such as detectors of toxicity, hate, emotion, etc

  • Transformation functions such as generators of typos, paraphrase, style tune, etc

To create custom tests, refer to this page.

For demo purposes, we will load a simple unit test (test_f1) that checks if the test F1 score is above the given threshold. For more examples of tests and functions, refer to the Giskard catalog.

[ ]:
test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()