Topic-Driven Summarization

We are building a framework for interactive exploration of large corpuses of narrative text. As an initial step, we are building an agent that tailors news article summaries based on human input. Our goal is to train an end-to-end abstractive text summarization system that tailors its summaries according to a given topic. Narrative text, like news articles, often have many different interrelated topics. For example, this recent article about the US Women’s National Soccer Team includes information about international sporting events, US politics, and the FIFA organization. There are many possible ways to summarize text to convey different perspectives and nuances in the original source. Accounting for topic is one way to direct automated summarization systems to tailor their output for human users. This post describes initial steps towards a topic-driven abstractive summarization system.

Dataset Construction

We start with the CNN/Daily Mail dataset, as processed by See et al. (2017), which contains a total of ~300,000 online news article and multi-sentence summary pairs. To learn the relationship between articles, summaries, and topics, we must also have topic information for each summary. We use Latent Dirichlet allocation (LDA) to train a topic model of the summaries in the original corpus. Our model includes 147 topics which are clusters of related words assigned to an integer. For example, here are the first three topic clusters in our model:

Topic 0: apple phone new mobile iphone phones app available devices ipad launch screen device google samsung android expected users company use microsoft version tablet watch service
Topic 1: cent people survey percent study half average likely shows poll 000 uk according say americans 10 just finds 50 research report nearly 40 adults 80
Topic 2: weight size lost stone diet pounds lose body dropped fat weighed loss fitness exercise eating just 12 healthy food day fit 10 weighs months gym

Topic 0 corresponds to mobile and tablet tech devices and companies, Topic 1 to scientific studies and public opinion polls, and Topic 2 to fitness and weight loss. Once the topic model has been trained, we apply the topics to each summary in the original corpus and find the probability that the summary belongs to each of the topics in our model. For example, the summary:

Once a super typhoon, Maysak is now a tropical storm with 70 mph winds. It could still cause flooding, landslides and other problems in the Philippines.

Is assigned the topic:

Topic 17: new people storm hit homes power residents damage hurricane officials 000 says area flooding say quake miles reported earthquake struck dead toll tornado caused flood

The model assigns this topic a probability of 85.6%. Once all summaries have been assigned, we follow the procedure in Krishna et al. (2018) to randomly merge pairs of articles and associate them with each of the original summaries. This procedure is done to force the summarization model to learn which portions of text are important to summarize for each topic.

Models, Training, and Results

We start with a baseline pointer generator network, as described in See et al. (2017), which is an augmented neural sequence-to-sequence abstractive summarization architecture. We train the model for 180,000 iterations on our modified CNN/Daily Mail dataset augmented with summary topics. We also train a second pointer generator network where the summary topic integer is prepended to the usual news article in the model’s input. We expect the topic information model to produce more accurate summaries as measured by ROUGE metrics. Here are the results after training the second model for 90,000 iterations:













The network with topic information outperforms the baseline network on each of the ROUGE metrics. Here is an example summary from the corpus about a trial run of robotic employees:

Robot will work on a trial basis at Mitsubishi UFJ Financial Group branches. These trials of the 1ft 11inch 58cm assistant are expected to begin in April. Nao has four microphones, touch sensors and can speak 19 languages. Makers Aldebaran Robotics said it can also recognise human emotions. If successful, the robotic employees will be rolled out to more branches.

Each article is intermixed with a second article, assigned each of the original summaries, along with their LDA topic cluster. Here is the summary of the second article, randomly selected to be intermixed with the first about robots. It is about a young man suspended from school for selling sodas to fellow classmates:

Grade 12 student Keenan Shaw, 17, was handed a two-day suspension. He was told the sales violated the school 's nutrition and marketing policies and that he was operating a business without a licence. Keenan defended actions by pointing out other students have been known to sell marijuana, cigarettes, acid and even meth. Has now moved his business outside school to sidewalk.

Each summary is associated with a topic by LDA in the dataset. In this case, the first summary’s topic is related to technology and design:

Topic 143: used using technology device uses created designed use design 3d developed make machine light create company called project robot built able computer machines firm sensors

Trained on just the intermixed article and summary pairs, the baseline model produces the following summary:

12 Keenan,,,, from, week after was to selling at profit his locker. Was a suspension Winston High in,,,, selling. School 's nutritional policy, sodas not in.

Without topic information, the model summarizes the wrong intermixed article and produces strange grammar and punctuation errors. Here is the summary produced by the network trained with topic information:

Experts have warned robots could soon take over our jobs. It has two cameras mounted to its head, that act as eyes, as well as four directional microphones to act as its ears.

Given topic 143, the second model correctly identifies and summarizes information from the intermixed article that corresponds to technology. This difference in output accounts for the stronger ROUGE scores by the topic-driven model.