DocHero AI - Best paraphrasing and translation tool for academic and professional writing | DocHero AI - Best paraphrasing and translation tool for academic and professional writing

Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study

Imane Jaaouine, Ross D. King

2025-11-30

摘要

Large language models (LLMs) produce context inconsistency hallucinations, which are LLM generated outputs that are misaligned with the user prompt. This research project investigates whether prompt engineering (PE) methods can mitigate context inconsistency hallucinations in zero-shot LLM summarisation of scientific texts, where zero-shot indicates that the LLM relies purely on its pre-training data. Across eight yeast biotechnology research paper abstracts, six instruction-tuned LLMs were prom...

查看文献

问题

The study addresses the issue of context inconsistency hallucinations in LLM-generated summaries of scientific texts. It investigates whether prompt engineering methods can mitigate these hallucinations in zero-shot summarization, where LLMs rely solely on pre-training data.

方法

Six instruction-tuned LLMs were prompted with seven methods, including baseline prompts, prompts with increasing instruction complexity, and prompts with context repetition or random sentence addition. The summaries were evaluated using six metrics.

关键发现

The results indicated that context repetition and random addition significantly improved the lexical alignment of LLM-generated summaries with the original abstracts. However, increased instruction complexity did not improve semantic alignment and even caused a decline.

3个要点

Prompt engineering, specifically context repetition and random addition, can improve LLM summarization performance in zero-shot scientific summarization.
Increased instruction complexity in prompts can negatively impact the semantic alignment between LLM-generated summaries and the original abstracts.
Context repetition and random addition prompt methods disproportionately alter lexical alignment, suggesting their utility in reducing context inconsistency hallucinations.

学术详情点击展开

假设:H1: Repeating semantically key sentences improves lexical alignment. H2: Repeating randomly selected sentences improves lexical alignment. H3: Increasing prompt instruction complexity improves semantic alignment. H4: Repeating key sentences increases alignment with the key sentences.

研究对象:Eight open-access yeast-biotechnology papers' abstracts were used as source material.

干预措施:The intervention consisted of seven prompt methods: a baseline prompt, two levels of increasing instruction complexity (PE-1 and PE-2), two levels of context repetition (CR-K1 and CR-K2), and two levels of random addition (RA-K1 and RA-K2).

研究设计:A systematic framework was designed, created, and applied to investigate and evaluate the mitigation of context inconsistency hallucinations within zero-shot summarisation.

结果指标:The outcome measures were ROUGE-1, ROUGE-2, ROUGE-L, BERTScore, METEOR, and cosine similarity, which were used to compute the lexical and semantic alignment between the summaries and the abstracts.

样本量:336 LLM-generated summaries (8 papers × 6 LLMs × 7 prompts). A total of 3744 datapoints were collected.

统计方法:Statistical analysis was performed using bias-corrected and accelerated (BCa) bootstrap confidence intervals and Wilcoxon signed-rank tests with Bonferroni-Holm correction.

局限性:The dataset size of eight abstracts limits the ability to generalise these findings across broader scientific domains. Automatic metrics are used without human validation. Zero-shot prompting was used, and results may vary for fine-tuned LLMs.

未来研究方向:Future work could study the impact of semantic relevance of repeated sentences on summary alignment. Researching the relationships between context repetition and random addition performance effects, prompt length, and the number K of key sentences could offer further insights.

关键发现:Context repetition and random addition significantly improved the lexical alignment of LLM-generated summaries. Increased instruction complexity did not improve semantic alignment. PE-1 even caused decreased semantic alignment.

临床意义:This work presents prompt engineering as a practical tool for reducing context inconsistency hallucinations within LLM-based scientific summarisation.

生成于 12/8/2025