Open In Colab View Notebook on GitHub

Regression on the hotel reviews [scikit-learn]ΒΆ

Giskard is an open-source framework for testing all ML models, from LLMs to tabular models. Don’t hesitate to give the project a star on GitHub ⭐️ if you find it useful!

In this notebook, you’ll learn how to create comprehensive test suites for your model in a few lines of code, thanks to Giskard’s open-source Python library.

Use-case:

Outline:

  • Detect vulnerabilities automatically with Giskard’s scan

  • Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics

Install dependenciesΒΆ

Make sure to install the giskard

[ ]:
%pip install giskard --upgrade

Import librariesΒΆ

[1]:
from pathlib import Path
from urllib.request import urlretrieve

import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from typing import Iterable

from giskard import Model, Dataset, scan, testing

Define constantsΒΆ

[ ]:
# Constants.
FEATURE_COLUMN_NAME = "Full_Review"
TARGET_COLUMN_NAME = "Reviewer_Score"

# Paths.
DATA_URL = "https://giskard-library-test-datasets.s3.eu-north-1.amazonaws.com/hotel_text_regression_dataset-Hotel_Reviews.csv.tar.gz"
DATA_PATH = Path.home() / ".giskard" / "hotel_text_regression_dataset" / "Hotel_Reviews.csv.tar.gz"

Dataset preparationΒΆ

Load dataΒΆ

[3]:
def fetch_demo_data(url: str, file: Path) -> None:
    """Helper to fetch data from the FTP server."""
    if not file.parent.exists():
        file.parent.mkdir(parents=True, exist_ok=True)

    if not file.exists():
        print(f"Downloading data from {url}")
        urlretrieve(url, file)

    print(f"Data was loaded!")


def load_data(**kwargs) -> pd.DataFrame:
    fetch_demo_data(DATA_URL, DATA_PATH)
    df = pd.read_csv(DATA_PATH, **kwargs)

    # Create target column.
    df[FEATURE_COLUMN_NAME] = df.apply(lambda x: x["Positive_Review"] + " " + x["Negative_Review"], axis=1)

    return df
[ ]:
reviews_df = load_data(nrows=1000)

Train-test splitΒΆ

[5]:
train_X, test_X, train_Y, test_Y = train_test_split(
    reviews_df[[FEATURE_COLUMN_NAME]], reviews_df[TARGET_COLUMN_NAME], random_state=42
)

Wrap dataset with GiskardΒΆ

To prepare for the vulnerability scan, make sure to wrap your dataset using Giskard’s Dataset class. More details here.

[ ]:
raw_data = pd.concat([test_X, test_Y], axis=1)
giskard_dataset = Dataset(
    df=raw_data,
    # A pandas.DataFrame that contains the raw data (before all the pre-processing steps) and the actual ground truth variable (target).
    target=TARGET_COLUMN_NAME,  # Ground truth variable.
    name="hotel_text_regression_dataset",  # Optional.
)

Model buildingΒΆ

Define preprocessing stepsΒΆ

[7]:
def adapt_vectorizer_input(df: pd.DataFrame) -> Iterable:
    """Adapt input for the vectorizers.

    The problem is that vectorizers accept iterable, not DataFrame, but Series.
    Thus, we need to ravel dataframe with text have input single dimension.
    """

    df = df.iloc[:, 0]
    return df

Build estimatorΒΆ

[ ]:
# Define pipeline.
pipeline = Pipeline(
    steps=[
        ("vectorizer_adapter", FunctionTransformer(adapt_vectorizer_input)),
        ("vectorizer", TfidfVectorizer(max_features=10000)),
        ("regressor", GradientBoostingRegressor(n_estimators=10)),
    ]
)

# Fit pipeline.
pipeline.fit(train_X, train_Y)

# Perform inference on train and test data.
pred_train = pipeline.predict(train_X)
pred_test = pipeline.predict(test_X)

train_metric = mean_absolute_error(train_Y, pred_train)
test_metric = mean_absolute_error(test_Y, pred_test)

print(f"Train MAE: {train_metric: .2f}\n" f"Test MAE: {test_metric: .2f}")

Wrap model with GiskardΒΆ

To prepare for the vulnerability scan, make sure to wrap your model using Giskard’s Model class. You can choose to either wrap the prediction function (preferred option) or the model object. More details here.

[ ]:
# Wrap the prediction method
def prediction_function(df):
    return pipeline.predict(df)


giskard_model = Model(
    model=prediction_function,
    # A prediction function that encapsulates all the data pre-processing steps and that could be executed with the dataset used by the scan.
    model_type="regression",  # Either regression, classification or text_generation.
    name="hotel_text_regression",  # Optional.
    feature_names=[FEATURE_COLUMN_NAME],  # Default: all columns of your dataset.
)

# Validate wrapped model.
pred_test_wrapped = giskard_model.predict(giskard_dataset).raw_prediction
wrapped_test_metric = mean_absolute_error(test_Y, pred_test_wrapped)
print(f"Wrapped Test MAE: {wrapped_test_metric: .2f}")

Detect vulnerabilities in your modelΒΆ

Scan your model for vulnerabilities with GiskardΒΆ

Giskard’s scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our scan documentation.

[ ]:
results = scan(giskard_model, giskard_dataset)
[11]:
display(results)

Generate comprehensive test suites automatically for your modelΒΆ

Generate test suites from the scanΒΆ

The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model’s performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates.

[12]:
test_suite = results.generate_test_suite("My first test suite")
test_suite.run()
2024-05-29 17:25:08,294 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,295 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (250, 2) executed in 0:00:00.003143
2024-05-29 17:25:08,317 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,322 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (250, 2) executed in 0:00:00.008373
2024-05-29 17:25:08,325 pid:11998 MainThread giskard.utils.logging_utils INFO     Perturb and predict data executed in 0:00:00.781405
2024-05-29 17:25:08,326 pid:11998 MainThread giskard.utils.logging_utils INFO     Compare and predict the data executed in 0:00:00.001536
Executed 'Invariance to β€œAdd typos”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'transformation_function': <giskard.scanner.robustness.text_transformations.TextTypoTransformation object at 0x16bb4cd30>, 'threshold': 0.95, 'output_sensitivity': 0.05}:
               Test failed
               Metric: 0.91
                - [INFO] 241 rows were perturbed

2024-05-29 17:25:08,335 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,337 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (25, 2) executed in 0:00:00.004939
Executed 'MSE on data slice β€œ`Full_Review` contains "building"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303258760>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 3.6


2024-05-29 17:25:08,345 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,347 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (25, 2) executed in 0:00:00.004945
Executed 'MSE on data slice β€œ`Full_Review` contains "stay"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303428850>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 3.4


2024-05-29 17:25:08,354 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,356 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (46, 2) executed in 0:00:00.003780
Executed 'MSE on data slice β€œ`Full_Review` contains "bed"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303315120>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.96


2024-05-29 17:25:08,364 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,365 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (23, 2) executed in 0:00:00.005052
Executed 'MSE on data slice β€œ`Full_Review` contains "comfy"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033170a0>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.72


2024-05-29 17:25:08,374 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,375 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (24, 2) executed in 0:00:00.004607
Executed 'MSE on data slice β€œ`Full_Review` contains "area"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303303b50>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.63


2024-05-29 17:25:08,385 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,386 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (23, 2) executed in 0:00:00.005924
Executed 'MSE on data slice β€œ`Full_Review` contains "food"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033df850>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.57


2024-05-29 17:25:08,394 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,396 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (80, 2) executed in 0:00:00.005724
Executed 'MSE on data slice β€œ`Full_Review` contains "hotel"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033036d0>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.52


2024-05-29 17:25:08,406 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,406 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (124, 2) executed in 0:00:00.003998
Executed 'MSE on data slice β€œ`Full_Review` contains "room"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3032ed210>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.4


2024-05-29 17:25:08,408 pid:11998 MainThread giskard.core.suite INFO     Executed test suite 'My first test suite'
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO     result: failed
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO     Invariance to β€œAdd typos” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'transformation_function': <giskard.scanner.robustness.text_transformations.TextTypoTransformation object at 0x16bb4cd30>, 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9087136929460581}
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "building"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303258760>, 'threshold': 2.3372302706019896}): {failed, metric=3.600595896699133}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "stay"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303428850>, 'threshold': 2.3372302706019896}): {failed, metric=3.3984339534286825}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "bed"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303315120>, 'threshold': 2.3372302706019896}): {failed, metric=2.9552583451970587}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "comfy"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033170a0>, 'threshold': 2.3372302706019896}): {failed, metric=2.7172616774396765}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "area"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303303b50>, 'threshold': 2.3372302706019896}): {failed, metric=2.6276261964130216}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "food"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033df850>, 'threshold': 2.3372302706019896}): {failed, metric=2.57312044778542}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "hotel"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033036d0>, 'threshold': 2.3372302706019896}): {failed, metric=2.515134565551532}
2024-05-29 17:25:08,412 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice β€œ`Full_Review` contains "room"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3032ed210>, 'threshold': 2.3372302706019896}): {failed, metric=2.4023965763964417}
[12]:
close Test suite failed.
Test Invariance to β€œAdd typos”
Measured Metric = 0.90871 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
transformation_function Add typos
threshold 0.95
output_sensitivity 0.05
Test MSE on data slice β€œ`Full_Review` contains "building"”
Measured Metric = 3.6006 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "building"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "stay"”
Measured Metric = 3.39843 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "stay"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "bed"”
Measured Metric = 2.95526 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "bed"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "comfy"”
Measured Metric = 2.71726 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "comfy"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "area"”
Measured Metric = 2.62763 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "area"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "food"”
Measured Metric = 2.57312 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "food"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "hotel"”
Measured Metric = 2.51513 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "hotel"
threshold 2.3372302706019896
Test MSE on data slice β€œ`Full_Review` contains "room"”
Measured Metric = 2.4024 close Failed
model hotel_text_regression
dataset hotel_text_regression_dataset
slicing_function `Full_Review` contains "room"
threshold 2.3372302706019896

Customize your suite by loading objects from the Giskard catalogΒΆ

The Giskard open source catalog will enable to load:

  • Tests such as metamorphic, performance, prediction & data drift, statistical tests, etc

  • Slicing functions such as detectors of toxicity, hate, emotion, etc

  • Transformation functions such as generators of typos, paraphrase, style tune, etc

To create custom tests, refer to this page.

For demo purposes, we will load a simple unit test (test_r2) that checks if the test R2 score is above the given threshold. For more examples of tests and functions, refer to the Giskard catalog.

[ ]:
test_suite.add_test(testing.test_r2(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()