Data Leakage

Data leakage occurs when a predictive model uses data in the training phase that are unavailable when the model is in production. Consider the example below (source: Mostafa Saad Ibrahim):

The main concern about the data is related to splitting it, where images of the same animal might have occurred in both train and test sets so that the trained model is just memorizing and not learning. This is an example of Data Leakage.

This interesting article discusses Data Leakage in-depth and solutions to overcome it.

Leave a Comment

Your email address will not be published. Required fields are marked *