t-SNE (t-Distributed Stochastic Neighbor Embedding) is a popular technique for reducing the dimensionality of high-dimensional data into a two- or three-dimensional representation. t-SNE is often used for data visualization, as it can reveal underlying structures or patterns in the data that may not be apparent in the original high-dimensional space. It works by modeling similarities between data points in the high and lower-dimensional space and iteratively optimizing the mapping to minimize the difference between the two.

UpTrain supports t-SNE dimensionality reduction through the scikit-learn package. Here’s how we define the config for t-SNE visualization for the text summarization example

tsne_visual =    
{
    'type': uptrain.Visual.TSNE,
    "measurable_args": {
        'type': uptrain.MeasurableType.INPUT_FEATURE,
        'feature_name': 'bert_embs'
    },
    "label_args": {
        'type': uptrain.MeasurableType.INPUT_FEATURE,
        'feature_name': 'dataset_label'
    },
    # Hyperparameters for t-SNE
    'dim': '2D',
    'perplexity': 10,
    # Frequency to Calculate t-SNE 
    'update_freq': 100,
}

Here, the parameters related to the dataset’s features on which dimensionality reduction is applied are the same as in the case of UMAP. Further, t-SNE related hyperparameters, such as perplexity, are the same as defined in the scikit-learn package.

Continuing our example reference from UMAP, the following is how the t-SNE visualization looks like for the text summarization example.

t-SNE visualization for the BERT embeddings of the billsum and wikihow datasets for the text summarization task

Similar to UMAP, we see that the embeddings corresponding to the wikihow dataset have a different distribution than the billsum training and testing dataset.