Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
ConflictScore: Measuring How Language Models Handle Conflicting Evidence
Published:
TL;DR: Existing “factuality/faithfulness” metrics usually ask: is the answer supported by the evidence?
ConflictScore asks a sharper question: what if the evidence set itself disagrees—and the model acts overconfident anyway?
We introduce a claim-level metric (CS-C, CS-R), a benchmark (ConflictBench), and show conflict-aware regeneration improves truthfulness on TruthfulQA.
portfolio
Open Domain QA with Conflicting Contexts
25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search.
Information Pollution & Multi-Perspective Search
Perspectives-oriented search engine and multi-perspective news editorial corpus.
News Framing
Detecting frames in news headlines and analyzing framing trends surrounding US gun violence.
publications
Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding US Gun Violence
Published in CoNLL 2019, 2019
Detecting frames in news headlines and analyzing framing trends surrounding US gun violence.
Recommended citation: Siyi Liu, Lei Guo, Kate Mays, Margrit Betke, Derry Tanti Wijaya. "Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding US Gun Violence." CoNLL 2019.
Download Paper
Learning to Mirror Speaking Styles Incrementally
Published in arXiv, 2020
Learning to mirror speaking styles incrementally using neural approaches.
Recommended citation: Siyi Liu*, Ziang Leng*, Derry Wijaya. "Learning to Mirror Speaking Styles Incrementally." arXiv.
Download Paper
MultiOpEd: A Corpus of Multi-Perspective News Editorials
Published in NAACL 2021, 2021
A corpus of multi-perspective news editorials for studying opinion diversity.
Recommended citation: Siyi Liu, Sihao Chen, Xander Uyttendaele, Dan Roth. "MultiOpEd: A Corpus of Multi-Perspective News Editorials." NAACL 2021.
Download Paper | Download Slides
Design Challenges for a Multi-Perspective Search Engine
Published in NAACL 2022 Findings, 2022
Designing a search engine that presents multiple perspectives on controversial topics.
Recommended citation: Sihao Chen*, Siyi Liu*, Xander Uyttendaele, Yi Zhang, William Bruno, Dan Roth. "Design Challenges for a Multi-Perspective Search Engine." NAACL 2022 Findings.
Download Paper
Open-Domain Event Graph Induction for Mitigating Framing Bias
Published in arXiv, 2023
Open-domain event graph induction for mitigating framing bias in news.
Recommended citation: Siyi Liu, Hongming Zhang, Hongwei Wang, Kaiqiang Song, Dan Roth, Dong Yu. "Open-Domain Event Graph Induction for Mitigating Framing Bias." arXiv.
Download Paper
Using LLM for Improving Key Event Discovery: Temporal-Guided News Stream Clustering with Event Summaries
Published in EMNLP 2023, 2023
Temporal-guided news stream clustering with event summaries using LLMs.
Recommended citation: Nishanth Nakshatri, Siyi Liu, Sihao Chen, Daniel Hopkins, Dan Roth, Dan Goldwasser. "Using LLM for Improving Key Event Discovery: Temporal-Guided News Stream Clustering with Event Summaries." EMNLP 2023.
Download Paper
Towards Long Context Hallucination Detection
Published in NAACL 2025 Findings, 2025
Automatic hallucination detection for long context documents.
Recommended citation: Siyi Liu, Kishaloy Halder, et al. "Towards Long Context Hallucination Detection." NAACL 2025 Findings.
Download Paper
Open Domain Question Answering with Conflicting Contexts
Published in NAACL 2025 Findings, 2025
25% of unambiguous open-domain questions can lead to conflicting contexts when retrieved using Google Search.
Recommended citation: Siyi Liu, Qiang Ning, et al. "Open Domain Question Answering with Conflicting Contexts." NAACL 2025 Findings.
Download Paper
DeeptraceReward: Learning Human-Perceived Fakeness in Generated Videos with Multimodal LLMs
Published in NeurIPS GenProCC Workshop 2025, 2025
Learning human-perceived fakeness in generated videos using multimodal LLMs.
Recommended citation: Xingyu Fu, Siyi Liu, et al. "DeeptraceReward: Learning Human-Perceived Fakeness in Generated Videos with Multimodal LLMs." NeurIPS GenProCC Workshop 2025.
Download Paper
ConflictScore: Measuring How Language Models Handle Conflicting Evidence
Published in In submission, 2026
We propose ConflictScore, a metric for measuring how language models handle conflicting evidence.
Recommended citation: Siyi Liu, Patrick Xia, et al. "ConflictScore: Measuring How Language Models Handle Conflicting Evidence." In submission.
Download Paper
