Examples
Fraud Detection

Overview: In this example, we see how to use UpTrain to monitor performance of a fraud detection task. For the same, we will be training a binary classifier on a popular network traffic dataset called the NSL-KDD dataset for cyber-attack classification using the XGBoost classifier.

Dataset: The NSL-KDD dataset includes a variety of network attack types, including denial-of-service (DoS) attacks, unauthorized access (U2R) attacks, and probe attacks. The dataset contains a total of around 25,000 instances and 41 different features that describe the behavior of network connections, such as the number of failed login attempts and the size of packets transmitted.

Why is monitoring needed: Once our fraud detection model has been trained, it may initially perform well in detecting malicious activity. However, over time, attackers may adapt their tactics and evolve their methods, leading to a mismatch between the type of attacks seen during training and those seen in production. This can result in decreased accuracy in our model’s predictions.

Solution: We will be using UpTrain framework which provides an easy-to-configure way to log model predictions and attach ground-truth to monitor model’s performance. We are using drift detection method on top on model performance to raise alerts in case of any dip in model’s accuracy, commonly called Concept Drift.

Step 1: Let’s download and prepare the NSL-KDD dataset

Let’s read the data and see how it looks

Labels for first few rows: [0, 0, 1, 0, 0]

Input features for first few rows:

durationprotocol_typeserviceflagsrc_bytesdst_byteslandwrong_fragmenturgenthotdst_host_countdst_host_srv_countdst_host_same_srv_ratedst_host_diff_srv_ratedst_host_same_src_port_ratedst_host_srv_diff_host_ratedst_host_serror_ratedst_host_srv_serror_ratedst_host_rerror_ratedst_host_srv_rerror_rate
00120949100000150250.170.030.170.000.000.000.050.00
1024491460000025510.000.600.880.000.000.000.000.00
201495000000255260.100.050.000.001.001.000.000.00
30124923281530000302551.000.000.030.040.030.010.000.01
4012491994200000255190.070.070.000.000.000.000.070.00

Divide the data into training and test sets

We use first 10% of the data to train and 90% of the data to evaluate the model in production

X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, :-1].values, df.iloc[:, -1].values,
                                                    test_size = 0.9, 
                                                    random_state = 0,
                                                    shuffle=False)

print("Num Training samples: ", str(len(X_train)) + ",", " Num Testing samples: ", len(X_test))
Num Training samples:  14851,  Num Testing samples:  133666

Step 2: Train our XGBoost Classifier

# Train the XGBoost classifier with training data
classifier = XGBClassifier()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_train)
print("Training accuracy: " + str(100*accuracy_score(y_train, y_pred)))
Training accuracy: 100.0

Woah! 😲🔥 The training accuracy is 100%. Let’s see how long the model lasts in production.

Step 3: Monitoring model performance using UpTrain

cfg = {
    # Checks to identify concept drift
    "checks": [{
        'type': uptrain.Monitor.CONCEPT_DRIFT,
        'algorithm': uptrain.DataDriftAlgo.DDM
    }],
    
    # Folder that stores the drifted data-points identified by UpTrain
    "retraining_folder": 'uptrain_smart_data',
    
    # Enable streamlit logging to visualize model's performance
    "logging_args": {"st_logging": True},
}
pretty(cfg)
- checks:
    - type:
        Monitor.CONCEPT_DRIFT
    - algorithm:
        DataDriftAlgo.DDM
- retraining_folder:
    uptrain_smart_data
- st_logging:
    True
# Initialize the UpTrain framework
framework = uptrain.Framework(cfg)

batch_size = 10000
for i in range(int(len(X_test)/batch_size)):
    
    # Do model prediction
    inputs = {'data': {"feats": X_test[i*batch_size:(i+1)*batch_size]}}
    preds = classifier.predict(inputs['data']["feats"])
    
    # Log model inputs and outputs to monitor concept drift
    ids = framework.log(inputs=inputs, outputs=preds)
    
    # Attach ground truth to corresponding predictions 
    # in UpTrain framework and identify concept drift
    ground_truth = y_test[i*batch_size:(i+1)*batch_size] 
    framework.log(identifiers=ids, gts=ground_truth)
    
    # Pausing between batches to monitor progress in the dashboard
    time.sleep(0.5)
Drift detected with DDM at time: 111298!!!

As can be noted from the dashboard, we start seeing a sharp dip in model’s accuracy around the timestamp of 111k.

concept_drift_avg_acc

In the this example, we used a popular drift detection algorithm called the Drift Detection Method (DDM) which is already implemented as a part of the UpTrain package. However, as we see the model accuracy is dropping from 99.7% to 96.9% which is still a slow decline but not might raise many eyebrows.

For better detection and understanding the severity of the issue, one might want to define a customized metric and monitor the models using them. Let’s see how to do that in UpTrain.

Step 4: Define a Custom Monitor in UpTrain (for better monitoring)

We now define a custom drift metric which monitors the difference between accuracy of the model on the first 200 predictions and the most recent 200 predictions. This way, they can quickly identify if there was a sudden degradation in the model performance.

Let’s define our custom check and UpTrain config with check as “Custom Monitor” as below:

"""
Defining a custom drift metric to check if accuracy drops beyond a threshold.
"""

def custom_initialize_func(self):
    self.initial_acc = None       
    self.acc_arr = []
    self.count = 0       
    self.thres = 0.02
    self.window_size = 200
    self.is_drift_detected = False

def custom_check_func(self, inputs, outputs, gts=None, extra_args={}):
    batch_size = len(extra_args["id"])
    self.count += batch_size
    self.acc_arr.extend(list(np.equal(gts, outputs)))
    
    # Calculate initial performance of the model on first 200 points
    if (self.count >= self.window_size) and (self.initial_acc is None):
        self.initial_acc = sum(self.acc_arr[0:self.window_size])/self.window_size
        
    # Calculate the most recent accuracy and log it to dashboard.
    if (self.initial_acc is not None):
        for i in range(self.count - batch_size, self.count, self.window_size):
            
            # Calculate the most recent accuracy
            recent_acc = sum(self.acc_arr[i:i+self.window_size])/self.window_size
            
            # Logging to UpTrain dashboard
            self.log_handler.add_scalars('custom_metrics', {
                    'initial_acc': self.initial_acc,
                    'recent_acc': recent_acc,
                }, i, self.dashboard_name)
            
            # Send an alert when recent model performance goes down 
            if (self.initial_acc - recent_acc > self.thres) and (not self.is_drift_detected):
                alert = f"Concept drift detected with custom metric at time: {i}!!!" 
                print(alert)
                self.log_handler.add_alert(
                    "Model Performance Degradation Alert 🚨",
                    alert,
                    self.dashboard_name
                )
                self.is_drift_detected = True

cfg = {
    # Checks for our custom monitor
    "checks": [{
        'type': uptrain.Monitor.CUSTOM_MONITOR,
        'initialize_func': custom_initialize_func,
        'check_func': custom_check_func,
        'need_gt': True,
    }],
    
    # Folder that stores the drifted data-points identified by UpTrain
    "retraining_folder": 'uptrain_smart_data',
    
    # Enable streamlit logging to visualize model's performance
    "logging_args": {"st_logging": True},
}
pretty(cfg)
- checks:
    - type:
        Monitor.CUSTOM_MONITOR
    - initialize_func:
        <function custom_initialize_func at 0x154a29750>
    - check_func:
        <function custom_check_func at 0x154a29900>
    - need_gt:
        True
- retraining_folder:
    uptrain_smart_data
- st_logging:
    True
# Initialize the UpTrain framework
framework = uptrain.Framework(cfg)

batch_size = 10000
for i in range(int(len(X_test)/batch_size)):
    
    # Do model prediction
    inputs = {'data': {"feats": X_test[i*batch_size:(i+1)*batch_size]}}
    preds = classifier.predict(inputs['data']["feats"])
    
    # Log model inputs and outputs to monitor concept drift
    ids = framework.log(inputs=inputs, outputs=preds)
    
    # Attach ground truth to corresponding predictions 
    # in UpTrain framework and identify concept drift
    ground_truth = y_test[i*batch_size:(i+1)*batch_size] 
    framework.log(identifiers=ids, gts=ground_truth)
    
    # Pausing between batches to monitor progress in the dashboard
    time.sleep(0.5)
Concept drift detected with custom metric at time: 111000!!!

As we see, we see a sudden (and more alarming) drop using our custom monitors. We can clearly see that the model accuracy drops from 99.7% to 77%, enabling us to send better alerts and take more urgent measures (ex: model retraining) to solve them.

concept_drift_custom

Conclusion

Model monitoring is very crucial for tasks such as fraud detection, cyber-security attacks, etc. and where the attackers continuously improve their attack vectors and with time learn to evade detection. Real-time model observability enables one to proactively address any performance degradation before it leads to serious consequences, such as hacks or financial loss.

In this example, we saw two ways to detect performance degradation - Concept Drift via DDM and Custom monitor. The UpTrain framework has many other statistical tools, such as data drift, integrity checks, shift in model outputs, and outlier detection, that can be used to identify model issues, even in cases where ground truth is not available. You can explore them here

  • Automatically detecting edge-cases and out-of-distribution samples - Link
  • Defining custom signals to identify edge-cases - Link
  • Using Data-Drift (i.e. shifts in input distribution) to identify dips in model performance - Coming soon
  • Monitoring bias in recommendation systems - Link