UMAP Visualization

Visuals are predefined classes in the UpTrain framework that helps visualize your data and get deep insights about your machine learning models. Currently, UpTrain supports UMAP and t-SNE dimensionality reduction techniques and their corresponding visualizations in the UpTrain dashboard. Further, users can add their custom visualizations to see on the UpTrain dashboard. Next, we describe UMAP and t-SNE in more detail.

UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that has been shown to be highly effective for visualizing high-dimensional data, including embeddings learned by machine learning models. The UpTrain integrates with the python UMAP package to include a UMAP dimensionality reduction that enables users to visualize the embeddings of their models and gain insights into the structure and relationships within the data.

We recommend trying out the text summarization example to see UMAP visualization in action. The following is an instance of how the config is defined to generate UMAP visuals.

umap_visual = 
    'type': uptrain.Visual.UMAP,
    "measurable_args": {
        'type': uptrain.MeasurableType.INPUT_FEATURE,
        'feature_name': 'bert_embs'
    "label_args": {
        'type': uptrain.MeasurableType.INPUT_FEATURE,
        'feature_name': 'dataset_label'
    # Hyperparameters for UMAP
    'min_dist': 0.01,
    'n_neighbors': 20,
    'metric_umap': 'euclidean',
    'dim': '2D',
    # Frequency to Calculate UMAP dimensionality reduction
    'update_freq': 100,

Here, bert_embs is the feature name in the input data over which dimensionality reduction with UMAP and, subsequently, visualization in UpTrain dashboard happens. dataset_label is used to define the coloring of data points in the visualization; for example, training and production data can have different labels, making it easy to visualize clusters within different colors. The hyperparameters for UMAP, min_dist, n_neighbors, metric_umap, and dim, are the same as defined in the UMAP python package documentation.

Our model in the example was trained with the billsum dataset. Hence, in the figure below, we see that the embeddings corresponding to the wikihow dataset have a different distribution than billsum train and test data.

UMAP visualization for the BERT embeddings of the billsum and wikihow datasets for the text summarization task

Overall, UpTrain’s UMAP visualization feature provides users with a powerful tool to gain deeper insights into the data generated by their machine learning models.