Skip to content

docs: Add preprints on prompt-level incentive restructuring and hallucination mitigation#766

Open
mahashu wants to merge 4 commits into
dair-ai:mainfrom
mahashu:patch-1
Open

docs: Add preprints on prompt-level incentive restructuring and hallucination mitigation#766
mahashu wants to merge 4 commits into
dair-ai:mainfrom
mahashu:patch-1

Conversation

@mahashu

@mahashu mahashu commented May 20, 2026

Copy link
Copy Markdown

Description

This pull request updates the literature catalog by indexing two companion open-science preprints evaluating prompt-level constraint architectures on frontier models (ChatGPT and Gemini).

Papers Added & Core Findings

  • Standing on a Trapdoor: AI Hallucination and Prompt-Level Cost Restructuring (Kowalski et al., 2026; DOI: 10.5281/zenodo.20019087)
    • Focus: Introduces the baseline IDK+COMP constraint framework across 410 trials. Documents how default corporate alignment layers trade factual precision for conversational fluency, and demonstrates how to force model generation to terminate cleanly at the factual boundary.
  • A Puma in a Teacup: Signal Quality and Hallucination Suppression Through Prompt-Level Incentive Restructuring (Kowalski et al., 2026; DOI: 10.5281/zenodo.19502460)
    • Focus: Analyzes context-saturation behaviors, unhedged refusal bounds, and creative signal optimization across 362 trials using test strings with no ground truth. Documents the "Brake-and-Slide" failure mode where isolation of the compression mandate (COMP alone) paradoxically forces Gemini's fabrication rates up to 70%, while removing refusal permissions (IDK) spikes it to 100%.

This contribution ensures that developers tracking the Applications index have direct access to both the empirical performance metrics and the underlying behavioral failure modes of inference-time safety layers.

@vercel

vercel Bot commented May 20, 2026

Copy link
Copy Markdown

@mahashu is attempting to deploy a commit to the DAIR-AI Team on Vercel.

A member of the Team first needs to authorize it.

@mahashu

mahashu commented May 20, 2026

Copy link
Copy Markdown
Author

Clarification on methodology: The 410 trials represent the cumulative project total. The initial 362 trials are established in "A Puma in a Teacup" to map the core evaluation matrix across multiple governance conditions (including baseline, OGS, OGS-IDK, and COMP). The newest tracking block (trials 363-410) is introduced in "Standing on a Trapdoor" to isolate the specific IDK+COMP constraint configuration, requiring direct cross-comparison against those pre-established governance frameworks to map structural failure and suppression mechanics.

mahashu added 3 commits May 20, 2026 21:00
updated placeholders to Zenodo papers' URLs
corrected papers' URLs - again. This is terrible!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant