Skip to content

Repo for the COLING 2025 paper "What's Wrong? Refining Meeting Summaries with LLM Feedba"

License

Notifications You must be signed in to change notification settings

FKIRSTE/coling2025-feedback-meeting-sum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What's Wrong? Refining Meeting Summaries with LLM Feedback

This software project accompanies the research paper, What's Wrong? Refining Meeting Summaries with LLM Feedback. This paper has been accepted to COLING 2025 Main Track.

Content of the work

Large Language models (LLMs) reset how to handle the challenges of meeting summarization, such as informal/colloquial language; multiple speakers; complex meeting flow, implicit context. While LLMs allow for enhanced coherence and context understanding, they are still prone to including irrelevance and hallucination. Fitting previous approaches to mitigate such weaknesses are not feasible for LLMs out-of-the box, requiring expensive finetuning and re-exploring of approaches. Instead of crafting LLMs to better work with the summarization task, we use LLMs to address the challenges for meeting summaries directly by refining given summaries.

Main_Figure

Results

Multi-instance setups outperform single-instance approaches in error detection across all error types

Mistake_Identification_Accuracy

  • Why? Single-model setups struggle with long dependencies and contextualizing complex content, leading to more missed error instances.
  • What is different? Multi-instance approaches, especially with CoT prompting, handle context more effectively and reduce false negatives.
  • Caveat: GPT-4.o can be overly strict in labeling errors and the model’s heuristic influences mistakes related to subjectivity.

Quality

A simple ‘error exists’ note is not enough for comprehensive corrections.

  • Problem: Vague or minimal feedback fails to pinpoint all error types or offer targeted corrections.
  • Solution: Providing CoT-style explanations and specific correction hints yields deeper improvements.
  • Also: Transcript-based refinement suffers from repetitions and lack of depth. This likely stems from content repetition and unnecessary details.

Use the model’s feedback ‘as is’ for best results

  • Why Not Edit It? Consolidating or heavily modifying the generated feedback can dilute essential details. Models rely here more on the transcript for summary rewriting than on the feedback.
  • Outcome: Keeping the model’s own feedback intact leads to more accurate refinements.

Refined summaries better meet user expectations

  • How? A two-stage feedback and refinement process systematically corrects more errors in the final summary and increases summary depth.
  • Adaptability: The number of rewrites depends on the initial summary quality, preserving the original style and tone.

Run the project

run src/main.py

Citation

@inproceedings{kirstein-etal-2025a,
    title = "What's Wrong? Refining Meeting Summaries with LLM Feedback",
    author = "Kirstein, Frederic  and
      Ruas, Terry  and
      Gipp, Bela",
    year = "2025",
    month = jan,
    booktitle = {Proceedings of the 31th International Conference on Computational Linguistics},
    publisher = {International Committee on Computational Linguistics},
    address = {Abu Dhabi, the United Arab Emirates},
}

About

Repo for the COLING 2025 paper "What's Wrong? Refining Meeting Summaries with LLM Feedba"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy