大模型推理必读论文
作者: AINLP 来源: AINLP
🎁 Resources
Surveys
-
A Survey of Deep Learning for Mathematical Reasoning, ACL 2023 [paper]
-
Reasoning with Language Model Prompting: A Survey, ACL 2023 [paper]
-
A Survey for In-context Learning, arXiv.2301.00234 [paper]
-
A Survey of Large Language Models, arXiv.2303.18223 [paper]
-
Nature Language Reasoning, A Survey, arXiv.2303.14725 [paper]
Blogs
-
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, Dec 2022, Yao Fu’s Notion [blog]
-
Towards Complex Reasoning: the Polaris of Large Language Models, May 2023, Yao Fu’s Notion [blog]
💯 Benchmarks
Mathematical Reasoning
-
Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper]
-
Parsing Algebraic Word Problems into Equations, TACL 2015 [paper]
-
Solving General Arithmetic Word Problems, EMNLP 2015 [paper]
-
MAWPS: A Math Word Problem Repository, NAACL 2016 [paper]
-
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
-
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers, ACL 2020 [paper]
-
Are NLP Models really able to Solve Simple Math Word Problems?, ACL 2021 [paper]
-
Training Verifiers to Solve Math Word Problems, arxiv.2110.14168 [paper]
-
PAL: Program-aided Language Models, ICML 2023 [paper]
-
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms, NAACL 2019 [paper]
-
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. ACL 2019 [paper]
-
TheoremQA: A Theorem-driven Question Answering dataset, arXiv.2305.12524 [paper]
-
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, ACL 2021 [paper]
-
FinQA: A Dataset of Numerical Reasoning over Financial Data, EMNLP 2021 [paper]
-
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering, EMNLP 2022 [paper]
-
Measuring Mathematical Problem Solving With the MATH Dataset, NeurIPS 2021 [paper]
-
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks, ACL 2022 [paper]
-
LILA: A Unified Benchmark for Mathematical Reasoning, EMNLP 2022 [paper]
Commonsense Reasoning
-
Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI@ Reasoning Challenge, arxiv.2102.03315 [paper]
-
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering, ACL 2018 [paper]
-
PIQA: Reasoning about Physical Commonsense in Natural Language, AAAI 2020 [paper]
-
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge, NAACL 2019 [paper]
-
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification, NeurIPS 2021 [paper]
-
Event2Mind: Commonsense Inference on Events, Intents, and Reactions, ACL 2018 [paper]
-
Going on a vacation" takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding, EMNLP 2019 [paper]
-
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning, EMNLP 2019 [paper]
-
Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation, ACL 2019 [paper]
-
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies, TACL 2021 [paper]
Symbolic Reasoning
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, arXiv.2206.04615 [paper]
-
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, ACL 2023 [paper]
Logical Reasoning
-
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning, ICLR 2020 [paper]
-
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning, IJCAI 2020 [paper]
-
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language, ACL 2021 [paper]
-
FOLIO: Natural Language Reasoning with First-Order Logic, arxiv.2209.00840 [paper]
-
Language Models as Inductive Reasoners, arxiv.2212.10923 [paper]
-
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought, ICLR 2023 [paper]
Multi-modal Reasoning
Visual-Language (Image)
-
From Recognition to Cognition: Visual Commonsense Reasoning, CVPR 2019 [paper]
-
VisualCOMET: Reasoning About the Dynamic Context of a Still Image, ICCV 2020 [paper]
-
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues, ACL 2022 [paper]
-
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS 2022 [paper]
Video-Language
-
What is More Likely to Happen Next? Video-and-Language Future Event Prediction, EMNLP 2020 [paper]
-
CLEVRER: Collision Events for Video Representation and Reasoning, ICLR 2020 [paper]
-
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions, CVPR 2021 [paper]
-
STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]
-
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering, CVPR 2022 [paper]
-
NewsKVQA: Knowledge-Aware News Video Question Answering, PAKDD 2022 [paper]
🚀 Advances
XoT Construction
Manual Construction
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]
-
PAL: Program-aided Language Models, PMLR 2023 [paper]
-
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]
-
MathPrompter: Mathematical Reasoning using Large Language Models, ACL 2023 [paper]
-
Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]
Automatic Construction
-
Large Language Models are Zero-Shot Reasoners, NeurIPS 2022 [paper]
-
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]
-
Automatic Chain of Thought Prompting in Large Language Models, ICLR 2023 [paper]
-
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling, arxiv.2305.09993 [paper]
-
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023 [paper]
Semi-Automatic Construction
-
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, ICLR 2023 [paper]
-
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models, arxiv.2302.00618 [paper]
-
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data, arxiv.2302.12822 [paper]
-
Explanation Selection Using Unlabeled Data for In-Context Learning, arxiv.2302.04813 [paper]
-
Boosted Prompt Ensembles for Large Language Models, arxiv.2304.05970 [paper]
XoT Structural Variants
Chain Structure
-
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]
-
PAL: Program-aided Language Models, PMLR 2023 [paper]
-
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models, arxiv.2305.10276 [paper]
-
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, arxiv.2308.10379 [paper]
Tree Structure
-
Large Language Model Guided Tree-of-Thought, arxiv.2305.08291 [paper]
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.2305.10601 [paper]
-
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arxiv.2307.15337 [paper]
Graph Structure
-
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arxiv.2308.09687 [paper]
-
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arxiv.2308.08614 [paper]
XoT Enhancement Methods
Verify and Refine
-
Making Language Models Better Reasoners with Step-Aware Verifier, ACL 2022 [paper]
-
Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]
-
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]
-
Large language models are reasoners with self-verification, arxiv.2212.09561 [paper]
-
Reflexion: Language Agents with Verbal Reinforcement Learning, arxiv.2303.11366 [paper]
-
Self-Refine: Iterative Refinement with Self-Feedback, arxiv.2303.17651 [paper]
-
REFINER: Reasoning Feedback on Intermediate Representations, arxiv.2304.01940 [paper]
-
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought, arxiv.2305.11499 [paper]
-
Deductive Verification of Chain-of-Thought Reasoning, arxiv.2306.03872 [paper]
-
Forward-Backward Reasoning in Large Language Models for Verification, arxiv.2308.07758 [paper]
Question Decomposition
-
Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]
-
Iteratively Prompt Pre-trained Language Models for Chain of Thought, EMNLP 2022 [paper]
-
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, ICLR 2023 [paper]
-
Decomposed Prompting: A Modular Approach for Solving Complex Tasks, ICLR 2023 [paper]
-
Binding Language Models in Symbolic Languages, ICLR 2023 [paper]
-
Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning, SIGIR 2023 [paper]
External Knowledge
-
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models, arxiv.2305.06575 [paper]
-
MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts, arxiv.2305.05181 [paper]
-
Chain of Knowledge: A Framework for Grounding Large Language Models with Structured Knowledge Bases, arxiv.2305.13269 [paper]
-
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arxiv.2306.06427 [paper]
-
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering, arxiv.2308.13259 [paper]
Vote and Rank
-
Training Verifiers to Solve Math Word Problems, arxiv.2110.14168 [paper]
-
Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023 [paper]
-
Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]
-
Answering Questions by Meta-Reasoning over Multiple Chains of Thought, arxiv.2304.13007 [paper]
-
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning, arxiv.2308.00436 [paper]
Efficiency
-
Active Prompting with Chain-of-Thought for Large Language Models, arxiv.2302.12246 [paper]
-
Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs, arxiv.2305.11860 [paper]
-
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arxiv.2307.15337 [paper]
🛸 Frontier Application
Tool Using
-
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning, arxiv.2205.00445 [paper]
-
TALM: Tool Augmented Language Models, arxiv.2205.12255 [paper]
-
ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023 [paper]
-
Toolformer: Language Models Can Teach Themselves to Use Tools, arxiv.2302.04761 [paper]
-
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, arxiv.2303.17580 [paper]
-
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, arxiv.2303.11381 [paper]
-
API-Bank: A Benchmark for Tool-Augmented LLMs, arxiv.2304.08244 [paper]
Planning
-
Reflexion: Language Agents with Verbal Reinforcement Learning, arxiv.2303.11366 [paper]
-
Self-Refine: Iterative Refinement with Self-Feedback, arxiv.2303.17651 [paper]
-
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arxiv.2304.11477 [paper]
-
Large Language Model Guided Tree-of-Thought, arxiv.2305.08291 [paper]
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.2305.10601 [paper]
-
Reasoning with Language Model is Planning with World Model, arxiv.2305.14992 [paper]
-
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arxiv.2308.09687 [paper]
-
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arxiv.2308.08614 [paper]
-
Dynamic Planning with a LLM, arxiv.2308.06391 [paper]
Distillation
-
STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]
-
Large Language Models Can Self-Improve, arxiv.2210.11610 [paper]
-
Teaching Small Language Models to Reason, ACL 2023 [paper]
-
Large Language Models Are Reasoning Teachers, ACL 2023 [paper]
-
Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think, ACL 2023 [[paper](https://doi.org/10.18653/v1/2023.acl-long.150]
-
SCOTT: Self-Consistent Chain-of-Thought Distillation, ACL 2023 [paper]
-
Specializing Smaller Language Models towards Multi-Step Reasoning, ICML 2023 [paper]
-
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, arxiv.2305.02301 [paper]
-
Contrastive Decoding: Open-ended Text Generation as Optimizatio, ACL 2023 [paper]
-
Contrastive Decoding Improves Reasoning in Large Language Models, arxiv.2309.09117 [paper]
🔭 Future Prospect
Multi-modal XoT
-
Multimodal Chain-of-Thought Reasoning in Language Models, arxiv.2302.00923 [paper]
-
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models, arxiv.2305.16582 [paper]
-
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering, arxiv.2305.03453 [paper]
-
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals, arxiv.2308.06207 [paper]
-
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning, arxiv.2308.0965 [paper]
Faithful XoT
-
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]
-
Rethinking with Retrieval: Faithful Large Language Model Inference, arxiv.2301.00303 [paper]
-
Faithful Chain-of-Thought Reasoning, arxiv.2301.13379 [paper]
-
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arxiv.2306.06427 [paper]
-
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning, arxiv.2307.11768 [paper]
-
Measuring Faithfulness in Chain-of-Thought Reasoning, arxiv.2307.13702 [paper]
CoT Theory
-
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango, arxiv.2209.07686 [paper]
-
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters, ACL 2023 [paper]
-
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners, arxiv.2305.14825 [paper]
-
Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs, arxiv.2305.18869 [paper]
-
Towards Revealing the Mystery behind Chain of Thought: a TheoreticalPerspective, arxiv.2305.15408 [paper]
-
Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions, arxiv.2307.13339 [paper]
🚢 Other works
-
The Unreliability of Explanations in Few-Shot In-Context Learning, arxiv.2205.03401 [paper]
-
A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams, arxiv.2206.05442 [paper]
-
Rationale-Augmented Ensembles in Language Models, arxiv.2207.00747 [paper]
-
Can language models learn from explanations in context?, EMNLP 2022 [paper]
-
Inferring Implicit Relations in Complex Questions with Language Models, EMNLP 2022 [paper]
-
Language Models of Code are Few-Shot Commonsense Learners, EMNLP 2022 [paper]
-
Solving Quantitative Reasoning Problems with Language Models, NeurIPS 2022 [paper]
-
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding, SIGKDD 2022 [paper]
-
Large Language Models are few(1)-shot Table Reasoners, EACL 2023 [paper]
-
Reasoning Implicit Sentiment with Chain-of-Thought Prompting, ACL 2023 [paper]
-
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method, ACL 2023 [paper]
-
Tab-CoT: Zero-shot Tabular Chain of Thought, ACL 2023 [paper]
-
Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models, ACL 2023 [paper]
-
Language models are multilingual chain-of-thought reasoners, ICLR 2023 [paper]
-
Ask Me Anything: A simple strategy for prompting language models, ICLR 2023 [paper]
-
Large Language Models Can Be Easily Distracted by Irrelevant Context, ICLR 2023 [paper]
进技术交流群请添加AINLP小助手微信(id: ainlp2)
请备注具体方向+所用到的相关技术点

关于AINLP
AINLP 是一个有趣有AI的自然语言处理社区,专注于 AI、NLP、机器学习、深度学习、推荐算法等相关技术的分享,主题包括LLM、预训练模型、自动生成、文本摘要、智能问答、聊天机器人、机器翻译、知识图谱、推荐系统、计算广告、招聘信息、求职经验分享等,欢迎关注!加技术交流群请添加AINLP小助手微信(id:ainlp2),备注工作/研究方向+加群目的。

阅读至此了,分享、点赞、在看三选一吧🙏
更多AI工具,参考Github-AiBard123,国内AiBard123