Skip to the content.

Literature

Background on LLMs, GPTs, and transformers

Our original inspiration: “Revealing the mystery behind chain of thought”

Connections between LLMs and biology

Tackling simple arithmetic using GPT architectures

The rest of this page is under construction, trying to use semi-automated tools to extract links from the Teams channels. Maybe everyone can extract and post all links from their own activity log?

From Aziz

📄 1. Emergent Response Planning in LLMs

Link: https://arxiv.org/html/2502.06258v1
Summary: This paper shows that large language models (LLMs) trained only to predict the next token nonetheless encode representations that reveal future planning behavior across their entire output, suggesting latent capabilities for anticipating structure, content, and overall response attributes beyond the next token.


📄 2. Prompt Repetition Improves Non-Reasoning LLMs

Link: https://arxiv.org/pdf/2512.14982
Summary: The authors demonstrate that simply repeating an input prompt (e.g., duplicating the text) can improve the performance of popular language models on non-reasoning benchmarks without increasing the number of generated tokens or inference latency.


📄 3. Memory Transformer

Link: https://arxiv.org/pdf/2006.11527
Summary: This paper introduces transformer architectures augmented with memory tokens that help capture both local and global sequence information, leading to improved performance on tasks like machine translation and language modeling by effectively storing and attending to non-local representations.