Conversation Summarization
Objective: Collect a finetuning dataset to improve a model that summarizes human conversations.
Model: We are working on the facebook/bart-large-xsum model
that was finetuned on the SAMSum dataset (available here).
It is one of the top performers in open-source models on the SAMSum corpus.
Dataset: Our model has been finetuned on the SAMSum corpus which has 16k conversations and their summaries. Additionally, we evaluate our model on the DialogSum corpus which has 13k conversations and their summaries. We note that the model has good performance on this new dataset as well, but slightly worse than its performance on the SAMSum dataset. Our objective is to create a finetuning dataset to improve the model on DialogSum like conversations.
Method: We employ several techniques to collect the fine-tuning dataset:
- Visualizing UMAP/t-SNE for low-performing clusters
- Finding clusters around data-points where accuracy is low
- Edge-case Collection (user defines the edge-case parameters based on heuristics/observations)
- Building Custom Monitor (that checks out-of-vocabulary cases)
Step-1 Installing and Importing required packages
# Install the packages mentioned below
# pip install uptrain rouge datasets umap-learn matplotlib py7zr
import json
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import random
import subprocess
import uptrain
import zipfile
from datasets import load_dataset
from rouge import Rouge
Step-2 Load Datasets from Hugging Face
samsum_dataset = load_dataset("samsum")
dialogsum_dataset = load_dataset("knkarthick/dialogsum")
Step-3 Download model outputs and their embeddings
We understand that running the bart-large-xsum can be time consuming on some machines, hence, we have pre-generated the model outputs and their corresponding sentence BERT embeddings to remote for both the SAMSum and DialogSUM datasets. Due to this, running this entire script does not take too much time (e.g., it runs in 3 minutes on my Macbook Air).
remote_url = "https://oodles-dev-training-data.s3.amazonaws.com/conversation_summarization_data.zip"
data_dir = 'data'
if not os.path.exists(data_dir):
file_downloaded_ok = subprocess.call("wget " + remote_url, shell=True,
stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
print("Data downloaded.")
with zipfile.ZipFile('conversation_summarization_data.zip', 'r') as zip_ref:
zip_ref.extractall("./")
print("Prepared Model Outputs.")
os.remove('conversation_summarization_data.zip')
else:
print("Skipping data download as it already exists.")
First, letβs see (literally) what we are dealing with. We plot the sentence BERT embeddings with UMAP dimensionality reduction. We apply dimensionality reduction on 3 type of datasets: SAMSum train (aka reference dataset), SAMSum test, and DialogSum train.
Step-4 Define helper functions
"""
Using training data (i.e., SAMSum train), we generate and save a reference
dataset to be used by the UpTrain framework. This dataset is used to detect
drift, apply dimensionality reductions and compare visualizations.
"""
def generate_reference_dataset(
summary, output_summaries_file, bert_embs_file, file_name, dataset_label
):
data = []
if not os.path.exists(file_name):
# Load model output summaries
f = open(output_summaries_file)
output_summaries = json.load(f)
f.close()
# Load respective BERT embeddings of output summaries
f = open(bert_embs_file)
bert_embs = list(json.load(f))
f.close()
data = []
for idx in range(len(bert_embs)):
if isinstance(dataset_label, str):
data.append(
{
"id": idx,
"dataset_label": dataset_label,
"summary": summary[idx],
"bert_embs": list(bert_embs[idx]),
"output": output_summaries[idx],
}
)
with open(file_name, "w") as f:
json.dump(data, f, cls=uptrain.UpTrainEncoder)
print("Generated reference dataset.")
else:
print("Reference dataset exists. Skipping generating again.")
"""
Run the model in production. First, we pass
800 data points from SAMSum test and then
12400 data points from DialogSum train.
"""
def run_production(framework, batch_size=200):
for dataset_name in ['samsum', 'dialogsum']:
if dataset_name=='samsum':
d_type = 'test'
dataset = samsum_dataset[d_type]
elif dataset_name=='dialogsum':
d_type = 'train'
dataset = dialogsum_dataset[d_type]
else:
raise Exception("Dataset Error")
f = open(os.path.join(data_dir, f"out_{d_type}_{dataset_name}_summaries.json"))
all_summaries = json.load(f)
f.close()
"""
Note: We use sentence BERT embeddings generated from here:
https://huggingface.co/sentence-transformers
But any other embeddings, such as the ones generated by the
encoder can be used as well.
"""
f = open(os.path.join(data_dir, f"out_{d_type}_{dataset_name}_bert_embs.json"))
all_bert_embs = json.load(f)
f.close()
for idx in range(len(all_bert_embs)//batch_size):
idxs = slice(idx*batch_size, (idx+1)*batch_size)
this_batch = dataset['summary'][idxs]
this_batch_dialog = dataset['dialogue'][idxs]
inputs = {
'id': list(range(idx*batch_size, (idx+1)*batch_size)),
'bert_embs': np.array(all_bert_embs[idxs]),
'dataset_label': [dataset_name]*batch_size,
'dialog': this_batch_dialog,
'summary': this_batch,
}
idens = framework.log(inputs=inputs, outputs=all_summaries[idxs])
print(f"{(idx+1)*batch_size} predictions logged for {dataset_name} {d_type}")
Step-5 Generate reference dataset for dimensionality reduction
# Get the locations of training-related data and outputs
output_summaries_file = os.path.join(data_dir, 'out_train_samsum_summaries.json')
bert_embs_file = os.path.join(data_dir, 'out_train_samsum_bert_embs.json')
reference_dataset_file = os.path.join(data_dir, 'reference_dataset.json')
# Generate and save reference dataset
generate_reference_dataset(samsum_dataset['train']['summary'], output_summaries_file,
bert_embs_file, reference_dataset_file, 'reference')
Step-6 Defining Config Dimensionality Reduction and Visualization using UMAP
We define the configuration for the UpTrain framework with the required parameters for UMAP visualization in UpTrain. You can refer to the documentation to understand what each parameter means.
umap_check = {
"type": uptrain.Visual.UMAP,
"measurable_args": {
'type': uptrain.MeasurableType.INPUT_FEATURE,
'feature_name': 'bert_embs'
},
"label_args": {
'type': uptrain.MeasurableType.INPUT_FEATURE,
'feature_name': 'dataset_label'
},
"hover_args": [
{
'type': uptrain.MeasurableType.INPUT_FEATURE,
'feature_name': 'id'
},
{
'type': uptrain.MeasurableType.PREDICTION,
'feature_name': 'output'
},
{
'type': uptrain.MeasurableType.INPUT_FEATURE,
'feature_name': 'summary'
},
],
"update_freq": 13200,
"initial_dataset": reference_dataset_file,
"do_clustering": False,
}
config = {
"checks": [umap_check],
"logging_args": {"st_logging": True},
}
framework = uptrain.Framework(cfg_dict=config)
Step-7 Running model in production and logging data to UpTrain
run_production(framework)
UMAP Visualization
Datasets marked reference
(i.e., SAMSum training) and samsum
(i.e., SAMSum test) are close in the UMAP space. Most point from the DialogSum dataset are further than the data on which the model was finetuned on (i.e., reference aka SAMSum train).
Next, we identify poorly performing points and find clusters around them.
Step-8 Identifying poorly performing points
Define a performance metric. We use Rogue-L similarity, but choose any metric that is relevant to your use-case.
def rogue_l_similarity(text1_list, text2_list):
r = Rouge()
res = r.get_scores([x.lower() for x in text1_list],[x.lower() for x in text2_list])
return [x['rouge-l']['f'] for x in res]
Get Rogue-L Performance scores on the Dialogsum Data
file = os.path.join(data_dir, "out_train_dialogsum_summaries.json")
with open(file) as f:
dialogsum_summaries = json.load(f)
dialogsum_gts = dialogsum_dataset['train']['summary'][0:len(dialogsum_summaries)]
dialogsum_scores = rogue_l_similarity(dialogsum_summaries, dialogsum_gts)
dialogsum_train_bert_embs_file = os.path.join(data_dir, 'out_train_dialogsum_bert_embs.json')
with open(dialogsum_train_bert_embs_file) as f:
dialogsum_train_bert_embs = np.array(json.load(f))
Select bad-performing data-points
# Select data-points where Rogue-L scores are 0.0
outlier_idxs = np.where(np.array(dialogsum_scores) <= 0.0)[0]
selected_outliers = dialogsum_train_bert_embs[outlier_idxs, :]
Defining a monitor for catching data-points close to outliers
close_to_outliers_check = {
"type: uptrain.Monitor.DATA_DRIFT,
"is_embedding: True,
"measurable_args: {
"type": uptrain.MeasurableType.INPUT_FEATURE,
"feature_name": "bert_embs"
},
"reference_dataset": reference_dataset_file,
# Number of clusters to calculate data drift
"num_buckets": 50,
# Number of points to wait before calculating drift
"initial_skip": 500,
# Outliers around which we want to collect data-points
"outlier_data": selected_outliers,
# Threshold for Earth-moving-distance (EMD) to collect drift points
"emd_threshold": 10
}
Defining UpTrain framework and running in production
config = {
"checks": [close_to_outliers_check],
"retraining_folder": "smart_data_close_to_outliers",
}
framework = uptrain.Framework(cfg_dict=config)
run_production(framework)
Noting the performance of collected points that are close to the outliers (Disclaimer: Itβs 0.7 less)
print("Overall Accuracy (Rogue-L): ", np.mean(dialogsum_scores))
smart_data = pd.read_csv(config['retraining_folder'] + '/1/smart_data.csv')
smart_data = smart_data[smart_data['reasons'] == '"Close_to_User_annotated_Outliers"'].to_dict('records')
smart_data_scores = rogue_l_similarity([eval(x['output']) for x in smart_data],
[eval(x['summary']) for x in smart_data])
print("Accuracy on clusters around user-picked outliers", np.mean(smart_data_scores))
Note how the accuracy obtained on points that are closer to outliers is worse compared to the overall accuracy.
While analyzing the model outputs above, we made a few observations on cases where model does not perform well. Note that these are not statistical ways of finding edge cases but are more inspired by our intuition on dealing with the above data.
Observation: Model performs badly for long dialogs. For example, it generates the following (incomplete) summaries for long dialogs:
"Benjamin, Elliot, Daniel and Hilary are going to have lunch with French"
"Jesse, Lee, Melvin and Maxine are going to chip in for the"
"Jayden doesn't want to have children now, but maybe in the future when"
"Leah met a creepy guy at the poetry reading last night. He asked her"
"Jen wants to break up with her boyfriend. He hasn't paid her back the"
Next, we generate a histogram of length of input dialogues on the training dataset (i.e., SAMSum train). From here, we note that a length of 1700 can be a good cut-off to collect large conversation data-points.
a = [len(x) for x in samsum_dataset['train']['dialogue']]
fig, ax = plt.subplots(figsize =(7, 3))
ax.hist(a, bins=16)
ax.set_xlabel('Input dialog length')
ax.set_ylabel('Number of data points')
plt.show()
Step-9 Defining Edge Cases based on observations/heuristics
Edge case Type 1: Long dialogues
"""
Check if the length of the input is greater
than 1700 characters.
"""
def length_check_func(inputs, outputs, gts=None, extra_args={}):
this_batch_dialog = inputs['dialog']
return np.array([len(x) for x in this_batch_dialog]) > 1700
edge_case_length = {
'type': uptrain.Monitor.EDGE_CASE,
'signal_formulae': uptrain.Signal("Length_dialog", length_check_func)
}
Observation: When the model is not able to summarize well, it just copies one or two sentences. This may work in general but performs very badly when we have a negation in the conversation. See the following examples:
Input:
Janice: my son has been asking me to get him a hamster for his birthday. Janice: Should I? Martina: NO! NO! NO! NO! NO! Martina: I got one for my son and it stank up the whole house. Martina: So don't do it!!!
Output: Janice's son wants her to get him a hamster for his birthday.
Input:
Person1: Hello, I'm looking for a shop that sells inexpensive cashmere sweaters. Person2: Have you tried an outlet?Person1: Why didn't I think of that? Person2: Many of my friends shop at outlets. Person1: Thanks. That is a good suggestion. Person2: I'm only too happy to help.
Output: Person1 is looking for a shop that sells inexpensive cashmere sweaters.
Edge-case type 2: Copied sentences with negation
# Checking whether sentences from inputs are copied directly using Rogue-L metric
def rogueL_check_func(inputs, outputs, gts=None, extra_args={}):
r = Rouge()
res = r.get_scores([x.lower() for x in inputs['dialog']],[x.lower() for x in outputs])
rogue_l = [x['rouge-l']['f'] for x in res]
return np.array(rogue_l)
# Cheking whether there's a negation in the input
def negation_func(inputs, outputs, gts=None, extra_args={}):
has_negation = []
for text in inputs['dialog']:
this_has_negation = False
all_words = text.split()
for negation_word in ['no', 'not', "can't", "couldn't", "won't", "didn't", "don't"]:
if negation_word in all_words:
this_has_negation = True
has_negation.append(this_has_negation)
return has_negation
edge_case_negation = {
'type': uptrain.Monitor.EDGE_CASE,
'signal_formulae': (uptrain.Signal("Rogue-L", rogueL_check_func) > 0.3)
& uptrain.Signal("Has_negation", negation_func)
}
Step-10 Custom Monitor to check Vocabulary Coverage
In this case, we define a custom monitor to see whatβs the average vocabulary coverage of the new dataset (i.e., DialogSum) on the old dataset (i.e., SAMSum). Defining a custom metric to check if there is a shift in vocabulary. Note that unlike previous edge cases checks that were stateless, this is a stateful check that contains the training vocabulary information.
from collections import Counter
# Helper function to filter certain characters from string
def clean_string(x):
x = x.lower()
x = x.replace('.', '')
x = x.replace(',', '')
x = x.replace('\'', '')
x = x.replace('?', '')
x = x.replace('#', '')
x = x.replace(':', '')
x = x.replace('!', '')
return x
# Define the training vocabulary
all_text = ""
for x in samsum_dataset['train']['dialogue']:
all_text += clean_string(x) + " "
vocab = Counter(all_text.split())
"""
Used to define a state which contains the training set
vocabulary and the out-of-vocab words (and their count).
"""
def vocab_init(self):
# Reference (i.e. training) vocabulary
self.vocab = set(vocab.keys())
self.vocab_arr = []
self.out_of_vocab_words = Counter()
"""
This is the actual check that checks the vocabulary coverage
of the production dataset in the training dataset.
"""
def vocab_drift(self, inputs, outputs, gts=None, extra_args={}):
for x in inputs['dialog']:
x_s = set(clean_string(x).split())
self.vocab_arr.append(len(x_s & self.vocab)/len(x_s))
outside_words = x_s - self.vocab
self.out_of_vocab_words.update(Counter(outside_words))
# Save 50 most common out of vocabulary words
with open("out_of_vocab_words.json", "w") as f:
json.dump(self.out_of_vocab_words.most_common(50), f)
# Calculate vocabulary coverage
count = len(self.vocab_arr)
coverage = 100*sum(self.vocab_arr)/count
# Logging to UpTrain dashboard
self.log_handler.add_scalars('vocab coverage',
{'y_coverage': coverage},
count, 'vocab_coverage', file_name='vocab_coverage')
# Defining a custom monitor check for vocabulary coverage
custom_monitor_check = {
"type": uptrain.Monitor.CUSTOM_MONITOR,
"initialize_func": vocab_init,
"check_func": vocab_drift,
"need_gt": False,
}
Step-11 Defining UpTrain Framework and running in production
config = {
"checks": [edge_case_negation, edge_case_length, custom_monitor_check],
"logging_args": {"st_logging": True},
"retraining_folder": "smart_data_edge_case_and_custom_monitor",
}
framework = uptrain.Framework(cfg_dict=config)
run_production(framework)
This results in the following output:
62 edge cases identified out of 800 total samples
800 predictions logged for samsum test
106 edge cases identified out of 1800 total samples
154 edge cases identified out of 3200 total samples
202 edge cases identified out of 4600 total samples
253 edge cases identified out of 5800 total samples
307 edge cases identified out of 7400 total samples
354 edge cases identified out of 8400 total samples
402 edge cases identified out of 9400 total samples
455 edge cases identified out of 10600 total samples
500 edge cases identified out of 11600 total samples
554 edge cases identified out of 13200 total samples
12400 predictions logged for dialogsum train
Vocabulary Coverage
We obtain the following plot from the UpTrain dashboard to check vocabulary coverage in production data. Initially, (for SAMSum test), the coverage is ~98%, but later (for DialogSum), the coverage decreases to ~95%.
Checking the collected Edge Cases
# Print edge-cases collected for each reason
def print_edge_cases(csv_file, num_per_reason=2):
df = pd.read_csv(csv_file)
reasons_covered = Counter()
for idx in range(len(df)):
reason = [df['reasons'][idx]]
count = reasons_covered.get(reason[0], 0)
if count >= num_per_reason:
continue
reasons_covered.update(reason)
print('Reason: ', reason[0])
print('Output: ', df['output'][idx])
print('Annotated Summary:', df['summary'][idx])
print('Dialogue: ', df['dialog'][idx])
print('')
print_edge_cases(config['retraining_folder'] + "/1/smart_data.csv")
Result:
Reason: "Signal-Length_dialog"
Output: "Clara is rewatching Dear White People on Netflix and recommends it to Neela"
Annotated Summary: "Clara is rewatching Dear White People and strongly recommends it to Neela."
Dialogue: "Clara: Hi, what you up to?\r\nNeela: Not much, chilling out.\r\nClara: Just rewatching Dear White People on Netflix, love it!\ud83d\ude0d\r\nNeela: Oh yeah, heard of it, but not seen it yet? Any good?\r\nClara: Well, yes! I just said it was, LOL. It's about a fictional Ivy League University and the students in one House of Residence.\r\nNeela: Why is it called Dear White People?\r\nClara: That's the name of the radio show the main character, Sam, presents on college radio.\r\nNeela: Yeah, but why is it so good?\r\nClara: Well, it's mainly stories from the perspective of black students there, which I find very interesting. The characters are strong and likeable too.\r\nNeela: I suppose it's rather different from the UK, then?\r\nClara: It seems so, as there is a lot more racial awareness and discrimination there than here. It all kicks off when there is a Blackface party held by an elite group of white students, which gets out of hand.\r\nNeela: How's that?\r\nClara: Well, obviously, the black students try to break it up and there's also an incident where one guy, Reggie, gets a loaded gun pointed at him by a campus policeman after he gets into an argument with a white student. It may be at another party, though, I'm not sure of that.\r\nNeela: Oh, that sounds pretty strong stuff. What else happens?\r\nClara: Well, there is a young black guy called Lionel who is coming to terms with being gay and is finding his voice as a journalist. He unearths corruption at the uni and he and Sam also uncover some conspiracy theory stuff about secret societies.\r\nNeela: Well, I must say, it does sound good, I'll check it out soon!\r\nClara: Definitely, there is supposed to be a Series 3 coming up next year, really looking forward to it!\r\nNeela: Well, thanks Clara, I'm just watching the rest of a movie and I'll try Dear White People.\r\nClara: Don't blame me if you get hooked and stay up till 4!\r\nNeela: See ya, love!\r\nClara: Bye!"
Reason: "Signal-Length_dialog"
Output: "Beth's mum's 40th birthday is in 6 weeks. Deirdre"
Annotated Summary: "Beth wants to organize a girls weekend to celebrate her mother's 40th birthday. She also wants to work at Deidre's beauty salon. Deidre offers her a few hours on Saturdays as work experience. They set up for a meeting tomorrow."
Dialogue: "Deirdre: Hi Beth, how are you love?\r\nBeth: Hi Auntie Deirdre, I'm been meaning to message you, had a favour to ask.\r\nDeirdre: Wondered if you had any thought about your Mum's 40th, we've got to do something special!\r\nBeth: How about a girls weekend, just mum, me, you and the girls, Kira will have to come back from Uni, of course.\r\nDeirdre: Sounds fab! Get your thinking cap on, it's only in 6 weeks! Bet she's dreading it, I remember doing that!\r\nBeth: Oh yeah, we had a surprise party for you, you nearly had a heart attack! \r\nDeirdre: Well, it was a lovely surprise! Gosh, thats nearly 4 years ago now, time flies! What was the favour, darling?\r\nBeth: Oh, it was just that I fancied trying a bit of work experience in the salon, auntie.\r\nDeirdre: Well, I am looking for Saturday girls, are you sure about it? you could do well in the exams and go on to college or 6th form.\r\nBeth: I know, but it's not for me, auntie, I am doing all foundation papers and I'm struggling with those.\r\nDeirdre: What about a tutor? Kira could help you in the hols.\r\nBeth: Maybe, but I'd like to try working. I'm 16 soon, I'm old enough.\r\nDeirdre: I know. Look, pop in tomorrow after school and we'll have a cuppa and a chat.\r\nBeth: Yes, thanks auntie. I'd really like to try the beauty therapy side.\r\nDeirdre: Its not for the squeamish, mind. Massage, pedicures, not to mention waxing!\r\nBeth: Oh yes, I was chatting to a friend about it yesterday!\r\nDeirdre: Maxine manages the beauty side, you can meet her tomorrow and we'll see how it goes.\r\nBeth: Yes, I'd really like that. \r\nDeirdre: We can try a few hours on a Saturday for a couple of weeks as work experience. I'll give you a tenner or so per session to start off for your lunch, coffee and bus fare etc. If you like, we'll take it from there.\r\nBeth: OK, I like the sound of it! See you tomorrow Auntie! Love you!\r\nDeirdre: Bye, lovely girl! Xx"
Reason: "Signal-And(Greater Than(Rogue-L,0.3),Has_negation)"
Output: "Selah can't see the phone number of the person whose phone is off."
Annotated Summary: "Selah called a person that did not pick up."
Dialogue: "Myah: <file_photo>\r\nSelah: I can't see the phone number very well. Rewrite it plz\r\nMyah: <file_photo>\r\nSelah: The phone of that person is off"
Reason: "Signal-And(Greater Than(Rogue-L,0.3),Has_negation)"
Output: "Janice's son wants her to get him a hamster for his birthday."
Annotated Summary: "Martina advises against getting a hamster. "
Dialogue: "Janice: my son has been asking me to get him a hamster for his birthday\r\nJanice: should i?\r\nMartina: NO! NO! NO! NO! NO!\r\nMartina: i got one for my son and it stank up the whole house\r\nMartina: so don't do it!!!"
Get out-of-vocab-words
f = open("out_of_vocab_words.json")
out_of_vocab_words = json.load(f)
f.close()
out_of_vocab_words = [x[0] for x in out_of_vocab_words]
print(out_of_vocab_words)
Result:
['person2', 'person1', 'yuan', 'person3', 'li', 'rmb', 'wang', 'taiwan', 'angeles', 'forty', 'fax', '00', 'clerk', 'twenty-five', 'branches', 'labor', 'furnished', 'advertisements', 'zhang', 'iba', 'forty-five', 'bye-bye', 'personnel', 'reporter', 'import', 'strengths', 'liu', 'automobile', 'non-smoking', 'assured', 'frequently', 'fourteen', 'appetizer', 'sellers', 'bid', 'eighty', 'ming', 'carry-on', 'airmail', 'consumer', 'chinas', 'sichuan', '[yeah]', 'weakness', 'organizations', 'honors', 'eighteen', 'singapore', 'exports', 'polluted']
Note from the above how a lot of the words are related to Asia (such as yuan, li, wang, taiwan, zhang, liu, chinas, sichaun, singapore, etc.). This implies that a lot of converation in the DialogSum datasets are focused on the Asia region. Next, we define a edge-case check to catch these cases.
Step-12 Applying check for Asian words in production data
asian_words = ['yuan', 'li', 'wang', 'taiwan', 'zhang', 'liu', 'chinas', 'sichaun', 'singapore']
def asian_words_check(inputs, outputs, gts=None, extra_args={}):
has_asian_word = [False]*len(inputs['dialog'])
for i,text in enumerate(inputs['dialog']):
all_words = clean_string(text).split()
if len(set(asian_words).intersection(set(all_words))):
has_asian_word[i] = True
return has_asian_word
edge_case_asian_word = {
'type': uptrain.Monitor.EDGE_CASE,
'signal_formulae': uptrain.Signal("asian_word", asian_words_check)
}
Letβs define config again and log production data.
config = {
"checks": [edge_case_asian_word],
"retraining_folder": "smart_data_asian_words",
}
framework = uptrain.Framework(cfg_dict=config)
run_production(framework)
Checking the collected edge cases
print_edge_cases(config['retraining_folder'] + "/1/smart_data.csv")
Result:
Reason: "Signal-asian_word"
Output: "#Person1# and #Person2# need to check in at the Air"
Annotated Summary: "#Person1# asks #Person2# what they need to do when they check in at the Air China's counter."
Dialogue: "#Person1#: We're supposed to check in at the Air China's counter 30 minutes before take-off, Joe.\n#Person2#: Yes, I know. The boarding time on the ticket says 17:05, and now it's 16:15. I guess we have plenty of time.\n#Person1#: Do we need to show our ID cards when checking in?\n#Person2#: Yes. It's essential.\n#Person1#: What about our luggage?\n#Person2#: We can check it and hand carry the small bags. And we have to open each for inspection.\n#Person1#: Are they going to frisk all the passengers?\n#Person2#: I think so. We certainly don't want a hijack to happen on the plane today."
Reason: "Signal-asian_word"
Output: "#Person2# wants to buy a leather jacket. #Person1# will"
Annotated Summary: "#Person2# buys a leather jacket and a dress made of pure silk with #Person1#'s recommendation."
Dialogue: "#Person1#: Can I help you?\n#Person2#: I want a leather jacket.\n#Person1#: What size, please?\n#Person2#: Size 40.\n#Person1#: What color would you prefer?\n#Person2#: Let me see. Do you think a brown one will do?\n#Person1#: Well, the brown one is beautiful indeed, but I think the black one will suit you better.\n#Person2#: Really? Please get it for me.\n#Person1#: Will there be anything else?\n#Person2#: Is this dress made of pure silk?\n#Person1#: Yes, it is. It's brilliant.\n#Person2#: Is it washable?\n#Person1#: Yes, it is. But you have to be careful.\n#Person2#: How much, please?\n#Person1#: Only 350 yuan.\n#Person2#: All right. Will you wrap it for me?\n#Person1#: OK. Here you are."
And with that, we have completed the walkthrough of the conversation summarization example. The entire source code can be found as a Jupyter Notebook here.