Regression on the hotel reviews [scikit-learn]ΒΆ
Giskard is an open-source framework for testing all ML models, from LLMs to tabular models. Donβt hesitate to give the project a star on GitHub βοΈ if you find it useful!
In this notebook, youβll learn how to create comprehensive test suites for your model in a few lines of code, thanks to Giskardβs open-source Python library.
Use-case:
Regression task of predicting review βscoreβ, based on the review text.
Outline:
Detect vulnerabilities automatically with Giskardβs scan
Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics
Install dependenciesΒΆ
Make sure to install the giskard
[ ]:
%pip install giskard --upgrade
Import librariesΒΆ
[1]:
from pathlib import Path
from urllib.request import urlretrieve
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from typing import Iterable
from giskard import Model, Dataset, scan, testing
Define constantsΒΆ
[ ]:
# Constants.
FEATURE_COLUMN_NAME = "Full_Review"
TARGET_COLUMN_NAME = "Reviewer_Score"
# Paths.
DATA_URL = "https://giskard-library-test-datasets.s3.eu-north-1.amazonaws.com/hotel_text_regression_dataset-Hotel_Reviews.csv.tar.gz"
DATA_PATH = Path.home() / ".giskard" / "hotel_text_regression_dataset" / "Hotel_Reviews.csv.tar.gz"
Dataset preparationΒΆ
Load dataΒΆ
[3]:
def fetch_demo_data(url: str, file: Path) -> None:
"""Helper to fetch data from the FTP server."""
if not file.parent.exists():
file.parent.mkdir(parents=True, exist_ok=True)
if not file.exists():
print(f"Downloading data from {url}")
urlretrieve(url, file)
print(f"Data was loaded!")
def load_data(**kwargs) -> pd.DataFrame:
fetch_demo_data(DATA_URL, DATA_PATH)
df = pd.read_csv(DATA_PATH, **kwargs)
# Create target column.
df[FEATURE_COLUMN_NAME] = df.apply(lambda x: x["Positive_Review"] + " " + x["Negative_Review"], axis=1)
return df
[ ]:
reviews_df = load_data(nrows=1000)
Train-test splitΒΆ
[5]:
train_X, test_X, train_Y, test_Y = train_test_split(
reviews_df[[FEATURE_COLUMN_NAME]], reviews_df[TARGET_COLUMN_NAME], random_state=42
)
Wrap dataset with GiskardΒΆ
To prepare for the vulnerability scan, make sure to wrap your dataset using Giskardβs Dataset class. More details here.
[ ]:
raw_data = pd.concat([test_X, test_Y], axis=1)
giskard_dataset = Dataset(
df=raw_data,
# A pandas.DataFrame that contains the raw data (before all the pre-processing steps) and the actual ground truth variable (target).
target=TARGET_COLUMN_NAME, # Ground truth variable.
name="hotel_text_regression_dataset", # Optional.
)
Model buildingΒΆ
Define preprocessing stepsΒΆ
[7]:
def adapt_vectorizer_input(df: pd.DataFrame) -> Iterable:
"""Adapt input for the vectorizers.
The problem is that vectorizers accept iterable, not DataFrame, but Series.
Thus, we need to ravel dataframe with text have input single dimension.
"""
df = df.iloc[:, 0]
return df
Build estimatorΒΆ
[ ]:
# Define pipeline.
pipeline = Pipeline(
steps=[
("vectorizer_adapter", FunctionTransformer(adapt_vectorizer_input)),
("vectorizer", TfidfVectorizer(max_features=10000)),
("regressor", GradientBoostingRegressor(n_estimators=10)),
]
)
# Fit pipeline.
pipeline.fit(train_X, train_Y)
# Perform inference on train and test data.
pred_train = pipeline.predict(train_X)
pred_test = pipeline.predict(test_X)
train_metric = mean_absolute_error(train_Y, pred_train)
test_metric = mean_absolute_error(test_Y, pred_test)
print(f"Train MAE: {train_metric: .2f}\n" f"Test MAE: {test_metric: .2f}")
Wrap model with GiskardΒΆ
To prepare for the vulnerability scan, make sure to wrap your model using Giskardβs Model class. You can choose to either wrap the prediction function (preferred option) or the model object. More details here.
[ ]:
# Wrap the prediction method
def prediction_function(df):
return pipeline.predict(df)
giskard_model = Model(
model=prediction_function,
# A prediction function that encapsulates all the data pre-processing steps and that could be executed with the dataset used by the scan.
model_type="regression", # Either regression, classification or text_generation.
name="hotel_text_regression", # Optional.
feature_names=[FEATURE_COLUMN_NAME], # Default: all columns of your dataset.
)
# Validate wrapped model.
pred_test_wrapped = giskard_model.predict(giskard_dataset).raw_prediction
wrapped_test_metric = mean_absolute_error(test_Y, pred_test_wrapped)
print(f"Wrapped Test MAE: {wrapped_test_metric: .2f}")
Detect vulnerabilities in your modelΒΆ
Scan your model for vulnerabilities with GiskardΒΆ
Giskardβs scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our scan documentation.
[ ]:
results = scan(giskard_model, giskard_dataset)
[11]:
display(results)
Generate comprehensive test suites automatically for your modelΒΆ
Generate test suites from the scanΒΆ
The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your modelβs performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates.
[12]:
test_suite = results.generate_test_suite("My first test suite")
test_suite.run()
2024-05-29 17:25:08,294 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,295 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (250, 2) executed in 0:00:00.003143
2024-05-29 17:25:08,317 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,322 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (250, 2) executed in 0:00:00.008373
2024-05-29 17:25:08,325 pid:11998 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:00.781405
2024-05-29 17:25:08,326 pid:11998 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.001536
Executed 'Invariance to βAdd typosβ' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'transformation_function': <giskard.scanner.robustness.text_transformations.TextTypoTransformation object at 0x16bb4cd30>, 'threshold': 0.95, 'output_sensitivity': 0.05}:
Test failed
Metric: 0.91
- [INFO] 241 rows were perturbed
2024-05-29 17:25:08,335 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,337 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (25, 2) executed in 0:00:00.004939
Executed 'MSE on data slice β`Full_Review` contains "building"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303258760>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 3.6
2024-05-29 17:25:08,345 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,347 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (25, 2) executed in 0:00:00.004945
Executed 'MSE on data slice β`Full_Review` contains "stay"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303428850>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 3.4
2024-05-29 17:25:08,354 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,356 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (46, 2) executed in 0:00:00.003780
Executed 'MSE on data slice β`Full_Review` contains "bed"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303315120>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 2.96
2024-05-29 17:25:08,364 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,365 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.005052
Executed 'MSE on data slice β`Full_Review` contains "comfy"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033170a0>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 2.72
2024-05-29 17:25:08,374 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,375 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (24, 2) executed in 0:00:00.004607
Executed 'MSE on data slice β`Full_Review` contains "area"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303303b50>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 2.63
2024-05-29 17:25:08,385 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,386 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.005924
Executed 'MSE on data slice β`Full_Review` contains "food"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033df850>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 2.57
2024-05-29 17:25:08,394 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,396 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (80, 2) executed in 0:00:00.005724
Executed 'MSE on data slice β`Full_Review` contains "hotel"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033036d0>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 2.52
2024-05-29 17:25:08,406 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}
2024-05-29 17:25:08,406 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (124, 2) executed in 0:00:00.003998
Executed 'MSE on data slice β`Full_Review` contains "room"β' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3032ed210>, 'threshold': 2.3372302706019896}:
Test failed
Metric: 2.4
2024-05-29 17:25:08,408 pid:11998 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO result: failed
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO Invariance to βAdd typosβ ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'transformation_function': <giskard.scanner.robustness.text_transformations.TextTypoTransformation object at 0x16bb4cd30>, 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9087136929460581}
2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "building"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303258760>, 'threshold': 2.3372302706019896}): {failed, metric=3.600595896699133}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "stay"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303428850>, 'threshold': 2.3372302706019896}): {failed, metric=3.3984339534286825}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "bed"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303315120>, 'threshold': 2.3372302706019896}): {failed, metric=2.9552583451970587}
2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "comfy"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033170a0>, 'threshold': 2.3372302706019896}): {failed, metric=2.7172616774396765}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "area"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x303303b50>, 'threshold': 2.3372302706019896}): {failed, metric=2.6276261964130216}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "food"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033df850>, 'threshold': 2.3372302706019896}): {failed, metric=2.57312044778542}
2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "hotel"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3033036d0>, 'threshold': 2.3372302706019896}): {failed, metric=2.515134565551532}
2024-05-29 17:25:08,412 pid:11998 MainThread giskard.core.suite INFO MSE on data slice β`Full_Review` contains "room"β ({'model': <giskard.models.function.PredictionFunctionModel object at 0x10678f8b0>, 'dataset': <giskard.datasets.base.Dataset object at 0x3014ceb00>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x3032ed210>, 'threshold': 2.3372302706019896}): {failed, metric=2.4023965763964417}
[12]:
Customize your suite by loading objects from the Giskard catalogΒΆ
The Giskard open source catalog will enable to load:
Tests such as metamorphic, performance, prediction & data drift, statistical tests, etc
Slicing functions such as detectors of toxicity, hate, emotion, etc
Transformation functions such as generators of typos, paraphrase, style tune, etc
To create custom tests, refer to this page.
For demo purposes, we will load a simple unit test (test_r2) that checks if the test R2 score is above the given threshold. For more examples of tests and functions, refer to the Giskard catalog.
[ ]:
test_suite.add_test(testing.test_r2(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()