Literature
Background on LLMs, GPTs, and transformers
- J. MacCormick (2026). Thinking AI: How Artificial Intelligence Emulates Human Understanding. Princeton University Press. chapter 10 (link is restricted to members the COMP560 MS Team), PUP, Amazon.
- This is a chapter from a general audience book about modern AI systems. This chapter explains how a GPT works, employing simple descriptions that require no computer science background.
Our original inspiration: “Revealing the mystery behind chain of thought”
- Feng, G., Zhang, B., Gu, Y., Ye, H., He, D., & Wang, L. (2023). Towards revealing the mystery behind chain of thought: a theoretical perspective. Advances in Neural Information Processing Systems, 36, 70757-70798. pdf at neurips.cc. Teams discussion channel.
- This is the original motivation for our research project. It demonstrates how certain simple tasks such as arithmetic or solving linear equations can be tackled with chain-of-thought reasoning and how/why that is beneficial. We aim to replicate or extend these results in the specific domain very small simple tasks and small transformer models that can be trained on consumer laptops in only a few minutes. (The paper contains some very interesting theoretical results, but we focus more on the practical experiments.)
Connections between LLMs and biology
- Lindsey, et al., “On the Biology of a Large Language Model”, Transformer Circuits, 2025. html at Anthropic
Tackling simple arithmetic using GPT architectures
- Baeumel, Tanja, Josef van Genabith, and Simon Ostermann. “The lookahead limitation: Why multi-operand addition is hard for LLMs.” arXiv preprint arXiv:2502.19981 (2025). pdf at arXiv.
The rest of this page is under construction, trying to use semi-automated tools to extract links from the Teams channels. Maybe everyone can extract and post all links from their own activity log?
From Aziz
📄 1. Emergent Response Planning in LLMs
Link: https://arxiv.org/html/2502.06258v1
Summary: This paper shows that large language models (LLMs) trained only to predict the next token nonetheless encode representations that reveal future planning behavior across their entire output, suggesting latent capabilities for anticipating structure, content, and overall response attributes beyond the next token.
📄 2. Prompt Repetition Improves Non-Reasoning LLMs
Link: https://arxiv.org/pdf/2512.14982
Summary: The authors demonstrate that simply repeating an input prompt (e.g., duplicating the text) can improve the performance of popular language models on non-reasoning benchmarks without increasing the number of generated tokens or inference latency.
📄 3. Memory Transformer
Link: https://arxiv.org/pdf/2006.11527
Summary: This paper introduces transformer architectures augmented with memory tokens that help capture both local and global sequence information, leading to improved performance on tasks like machine translation and language modeling by effectively storing and attending to non-local representations.