Regression on the hotel reviews [scikit-learn]¶

Giskard is an open-source framework for testing all ML models, from LLMs to tabular models. Don’t hesitate to give the project a star on GitHub ⭐️ if you find it useful!

In this notebook, you’ll learn how to create comprehensive test suites for your model in a few lines of code, thanks to Giskard’s open-source Python library.

Use-case:

Regression task of predicting review ‘score’, based on the review text.
Reference notebook
Dataset

Outline:

Detect vulnerabilities automatically with Giskard’s scan
Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics

Install dependencies¶

Make sure to install the giskard

[ ]:

%pip install giskard --upgrade

Import libraries¶

[1]:

from pathlib import Path
from urllib.request import urlretrieve

import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from typing import Iterable

from giskard import Model, Dataset, scan, testing

Define constants¶

[ ]:

# Constants.
FEATURE_COLUMN_NAME = "Full_Review"
TARGET_COLUMN_NAME = "Reviewer_Score"

# Paths.
DATA_URL = "https://giskard-library-test-datasets.s3.eu-north-1.amazonaws.com/hotel_text_regression_dataset-Hotel_Reviews.csv.tar.gz"
DATA_PATH = Path.home() / ".giskard" / "hotel_text_regression_dataset" / "Hotel_Reviews.csv.tar.gz"

Dataset preparation¶

Load data¶

[3]:

def fetch_demo_data(url: str, file: Path) -> None:
    """Helper to fetch data from the FTP server."""
    if not file.parent.exists():
        file.parent.mkdir(parents=True, exist_ok=True)

    if not file.exists():
        print(f"Downloading data from {url}")
        urlretrieve(url, file)

    print(f"Data was loaded!")


def load_data(**kwargs) -> pd.DataFrame:
    fetch_demo_data(DATA_URL, DATA_PATH)
    df = pd.read_csv(DATA_PATH, **kwargs)

    # Create target column.
    df[FEATURE_COLUMN_NAME] = df.apply(lambda x: x["Positive_Review"] + " " + x["Negative_Review"], axis=1)

    return df

[ ]:

reviews_df = load_data(nrows=1000)

Train-test split¶

[5]:

train_X, test_X, train_Y, test_Y = train_test_split(
    reviews_df[[FEATURE_COLUMN_NAME]], reviews_df[TARGET_COLUMN_NAME], random_state=42
)

Wrap dataset with Giskard¶

To prepare for the vulnerability scan, make sure to wrap your dataset using Giskard’s Dataset class. More details here.

[ ]:

raw_data = pd.concat([test_X, test_Y], axis=1)
giskard_dataset = Dataset(
    df=raw_data,
    # A pandas.DataFrame that contains the raw data (before all the pre-processing steps) and the actual ground truth variable (target).
    target=TARGET_COLUMN_NAME,  # Ground truth variable.
    name="hotel_text_regression_dataset",  # Optional.
)

Model building¶

Define preprocessing steps¶

[7]:

def adapt_vectorizer_input(df: pd.DataFrame) -> Iterable:
    """Adapt input for the vectorizers.

    The problem is that vectorizers accept iterable, not DataFrame, but Series.
    Thus, we need to ravel dataframe with text have input single dimension.
    """

    df = df.iloc[:, 0]
    return df

Build estimator¶

[ ]:

# Define pipeline.
pipeline = Pipeline(
    steps=[
        ("vectorizer_adapter", FunctionTransformer(adapt_vectorizer_input)),
        ("vectorizer", TfidfVectorizer(max_features=10000)),
        ("regressor", GradientBoostingRegressor(n_estimators=10)),
    ]
)

# Fit pipeline.
pipeline.fit(train_X, train_Y)

# Perform inference on train and test data.
pred_train = pipeline.predict(train_X)
pred_test = pipeline.predict(test_X)

train_metric = mean_absolute_error(train_Y, pred_train)
test_metric = mean_absolute_error(test_Y, pred_test)

print(f"Train MAE: {train_metric: .2f}\n" f"Test MAE: {test_metric: .2f}")

Wrap model with Giskard¶

To prepare for the vulnerability scan, make sure to wrap your model using Giskard’s Model class. You can choose to either wrap the prediction function (preferred option) or the model object. More details here.

[ ]:

# Wrap the prediction method
def prediction_function(df):
    return pipeline.predict(df)


giskard_model = Model(
    model=prediction_function,
    # A prediction function that encapsulates all the data pre-processing steps and that could be executed with the dataset used by the scan.
    model_type="regression",  # Either regression, classification or text_generation.
    name="hotel_text_regression",  # Optional.
    feature_names=[FEATURE_COLUMN_NAME],  # Default: all columns of your dataset.
)

# Validate wrapped model.
pred_test_wrapped = giskard_model.predict(giskard_dataset).raw_prediction
wrapped_test_metric = mean_absolute_error(test_Y, pred_test_wrapped)
print(f"Wrapped Test MAE: {wrapped_test_metric: .2f}")

Detect vulnerabilities in your model¶

Scan your model for vulnerabilities with Giskard¶

Giskard’s scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our scan documentation.

[ ]:

results = scan(giskard_model, giskard_dataset)

[11]:

display(results)

Generate comprehensive test suites automatically for your model¶

Generate test suites from the scan¶

The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model’s performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates.

[12]:

test_suite = results.generate_test_suite("My first test suite")
test_suite.run()

2024-05-29 17:25:08,294 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,295 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (250, 2) executed in 0:00:00.003143
2024-05-29 17:25:08,317 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,322 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (250, 2) executed in 0:00:00.008373
2024-05-29 17:25:08,325 pid:11998 MainThread giskard.utils.logging_utils INFO     Perturb and predict data executed in 0:00:00.781405
2024-05-29 17:25:08,326 pid:11998 MainThread giskard.utils.logging_utils INFO     Compare and predict the data executed in 0:00:00.001536
Executed 'Invariance to “Add typos”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'transformation_function': <giskard.scanner.robustness.text_transformations.TextTypoTransformation object at 0x16bb4cd30>, 'threshold': 0.95, 'output_sensitivity': 0.05}:
               Test failed
               Metric: 0.91
                - [INFO] 241 rows were perturbed

2024-05-29 17:25:08,335 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,337 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (25, 2) executed in 0:00:00.004939
Executed 'MSE on data slice “`Full_Review` contains "building"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303258760>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 3.6


2024-05-29 17:25:08,345 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,347 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (25, 2) executed in 0:00:00.004945
Executed 'MSE on data slice “`Full_Review` contains "stay"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303428850>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 3.4


2024-05-29 17:25:08,354 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,356 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (46, 2) executed in 0:00:00.003780
Executed 'MSE on data slice “`Full_Review` contains "bed"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303315120>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.96


2024-05-29 17:25:08,364 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,365 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (23, 2) executed in 0:00:00.005052
Executed 'MSE on data slice “`Full_Review` contains "comfy"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033170a0>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.72


2024-05-29 17:25:08,374 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,375 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (24, 2) executed in 0:00:00.004607
Executed 'MSE on data slice “`Full_Review` contains "area"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303303b50>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.63


2024-05-29 17:25:08,385 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,386 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (23, 2) executed in 0:00:00.005924
Executed 'MSE on data slice “`Full_Review` contains "food"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033df850>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.57


2024-05-29 17:25:08,394 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,396 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (80, 2) executed in 0:00:00.005724
Executed 'MSE on data slice “`Full_Review` contains "hotel"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033036d0>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.52


2024-05-29 17:25:08,406 pid:11998 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,406 pid:11998 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (124, 2) executed in 0:00:00.003998
Executed 'MSE on data slice “`Full_Review` contains "room"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3032ed210>, 'threshold': 2.3372302706019896}:
               Test failed
               Metric: 2.4


2024-05-29 17:25:08,408 pid:11998 MainThread giskard.core.suite INFO     Executed test suite 'My first test suite'
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO     result: failed
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO     Invariance to “Add typos” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'transformation_function': <giskard.scanner.robustness.text_transformations.TextTypoTransformation object at 0x16bb4cd30>, 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9087136929460581}
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "building"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303258760>, 'threshold': 2.3372302706019896}): {failed, metric=3.600595896699133}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "stay"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303428850>, 'threshold': 2.3372302706019896}): {failed, metric=3.3984339534286825}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "bed"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303315120>, 'threshold': 2.3372302706019896}): {failed, metric=2.9552583451970587}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "comfy"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033170a0>, 'threshold': 2.3372302706019896}): {failed, metric=2.7172616774396765}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "area"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303303b50>, 'threshold': 2.3372302706019896}): {failed, metric=2.6276261964130216}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "food"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033df850>, 'threshold': 2.3372302706019896}): {failed, metric=2.57312044778542}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "hotel"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033036d0>, 'threshold': 2.3372302706019896}): {failed, metric=2.515134565551532}
2024-05-29 17:25:08,412 pid:11998 MainThread giskard.core.suite INFO     MSE on data slice “`Full_Review` contains "room"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3032ed210>, 'threshold': 2.3372302706019896}): {failed, metric=2.4023965763964417}

[12]:

Test suite failed.

Test Invariance to “Add typos”

Measured Metric = 0.90871 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

transformation_function Add typos

threshold 0.95

output_sensitivity 0.05

Test MSE on data slice “`Full_Review` contains "building"”

Measured Metric = 3.6006 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "building"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "stay"”

Measured Metric = 3.39843 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "stay"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "bed"”

Measured Metric = 2.95526 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "bed"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "comfy"”

Measured Metric = 2.71726 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "comfy"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "area"”

Measured Metric = 2.62763 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "area"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "food"”

Measured Metric = 2.57312 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "food"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "hotel"”

Measured Metric = 2.51513 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "hotel"

threshold 2.3372302706019896

Test MSE on data slice “`Full_Review` contains "room"”

Measured Metric = 2.4024 Failed

model hotel_text_regression

dataset hotel_text_regression_dataset

slicing_function `Full_Review` contains "room"

threshold 2.3372302706019896

Customize your suite by loading objects from the Giskard catalog¶

The Giskard open source catalog will enable to load:

Tests such as metamorphic, performance, prediction & data drift, statistical tests, etc
Slicing functions such as detectors of toxicity, hate, emotion, etc
Transformation functions such as generators of typos, paraphrase, style tune, etc

To create custom tests, refer to this page.

For demo purposes, we will load a simple unit test (test_r2) that checks if the test R2 score is above the given threshold. For more examples of tests and functions, refer to the Giskard catalog.

[ ]:

test_suite.add_test(testing.test_r2(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()