大模型推理必读论文作者： AINLP 来源： AINLP 🎁 Resources Surveys A Survey of Deep Learning for Mathematical Reasoning, ACL 2023 [paper] Reasoning with Language Model Prompting: A Survey, ACL 2023 [paper] A Survey for In-context Learning, arXiv.2301.00234 [paper] A Survey of Large Language Models, arXiv.2303.18223 [paper] Nature Language Reasoning, A Survey, arXiv.2303.14725 [paper] Blogs How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, Dec 2022, Yao Fu’s Notion [blog] Towards Complex Reasoning: the Polaris of Large Language Models, May 2023, Yao Fu’s Notion [blog] 💯 Benchmarks Mathematical Reasoning Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper] Parsing Algebraic Word Problems into Equations, TACL

大模型推理必读论文

By AiUrl
October 13, 2023 - 2 min read

作者： AINLP 来源： AINLP

🎁 Resources

Surveys

A Survey of Deep Learning for Mathematical Reasoning, ACL 2023 [paper]
Reasoning with Language Model Prompting: A Survey, ACL 2023 [paper]
A Survey for In-context Learning, arXiv.2301.00234 [paper]
A Survey of Large Language Models, arXiv.2303.18223 [paper]
Nature Language Reasoning, A Survey, arXiv.2303.14725 [paper]

Blogs

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, Dec 2022, Yao Fu’s Notion [blog]
Towards Complex Reasoning: the Polaris of Large Language Models, May 2023, Yao Fu’s Notion [blog]

💯 Benchmarks

Mathematical Reasoning

Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper]
Parsing Algebraic Word Problems into Equations, TACL 2015 [paper]
Solving General Arithmetic Word Problems, EMNLP 2015 [paper]
MAWPS: A Math Word Problem Repository, NAACL 2016 [paper]
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers, ACL 2020 [paper]
Are NLP Models really able to Solve Simple Math Word Problems?, ACL 2021 [paper]
Training Verifiers to Solve Math Word Problems, arxiv.2110.14168 [paper]
PAL: Program-aided Language Models, ICML 2023 [paper]
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms, NAACL 2019 [paper]
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. ACL 2019 [paper]
TheoremQA: A Theorem-driven Question Answering dataset, arXiv.2305.12524 [paper]
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, ACL 2021 [paper]
FinQA: A Dataset of Numerical Reasoning over Financial Data, EMNLP 2021 [paper]
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering, EMNLP 2022 [paper]
Measuring Mathematical Problem Solving With the MATH Dataset, NeurIPS 2021 [paper]
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks, ACL 2022 [paper]
LILA: A Unified Benchmark for Mathematical Reasoning, EMNLP 2022 [paper]

Commonsense Reasoning

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI@ Reasoning Challenge, arxiv.2102.03315 [paper]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering, ACL 2018 [paper]
PIQA: Reasoning about Physical Commonsense in Natural Language, AAAI 2020 [paper]
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge, NAACL 2019 [paper]
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification, NeurIPS 2021 [paper]
Event2Mind: Commonsense Inference on Events, Intents, and Reactions, ACL 2018 [paper]
Going on a vacation" takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding, EMNLP 2019 [paper]
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning, EMNLP 2019 [paper]
Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation, ACL 2019 [paper]
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies, TACL 2021 [paper]

Symbolic Reasoning

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, arXiv.2206.04615 [paper]
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, ACL 2023 [paper]

Logical Reasoning

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning, ICLR 2020 [paper]
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning, IJCAI 2020 [paper]
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language, ACL 2021 [paper]
FOLIO: Natural Language Reasoning with First-Order Logic, arxiv.2209.00840 [paper]
Language Models as Inductive Reasoners, arxiv.2212.10923 [paper]
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought, ICLR 2023 [paper]

Visual-Language (Image)

From Recognition to Cognition: Visual Commonsense Reasoning, CVPR 2019 [paper]
VisualCOMET: Reasoning About the Dynamic Context of a Still Image, ICCV 2020 [paper]
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues, ACL 2022 [paper]
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS 2022 [paper]

Video-Language

What is More Likely to Happen Next? Video-and-Language Future Event Prediction, EMNLP 2020 [paper]
CLEVRER: Collision Events for Video Representation and Reasoning, ICLR 2020 [paper]
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions, CVPR 2021 [paper]
STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering, CVPR 2022 [paper]
NewsKVQA: Knowledge-Aware News Video Question Answering, PAKDD 2022 [paper]

🚀 Advances

XoT Construction

Manual Construction

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]
PAL: Program-aided Language Models, PMLR 2023 [paper]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]
MathPrompter: Mathematical Reasoning using Large Language Models, ACL 2023 [paper]
Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]

Automatic Construction

Large Language Models are Zero-Shot Reasoners, NeurIPS 2022 [paper]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]
Automatic Chain of Thought Prompting in Large Language Models, ICLR 2023 [paper]
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling, arxiv.2305.09993 [paper]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023 [paper]

Semi-Automatic Construction

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, ICLR 2023 [paper]
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models, arxiv.2302.00618 [paper]
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data, arxiv.2302.12822 [paper]
Explanation Selection Using Unlabeled Data for In-Context Learning, arxiv.2302.04813 [paper]
Boosted Prompt Ensembles for Large Language Models, arxiv.2304.05970 [paper]

XoT Structural Variants

Chain Structure

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]
PAL: Program-aided Language Models, PMLR 2023 [paper]
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models, arxiv.2305.10276 [paper]
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, arxiv.2308.10379 [paper]

Tree Structure

Large Language Model Guided Tree-of-Thought, arxiv.2305.08291 [paper]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.2305.10601 [paper]
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arxiv.2307.15337 [paper]

Graph Structure

Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arxiv.2308.09687 [paper]
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arxiv.2308.08614 [paper]

XoT Enhancement Methods

Verify and Refine

Making Language Models Better Reasoners with Step-Aware Verifier, ACL 2022 [paper]
Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]
Large language models are reasoners with self-verification, arxiv.2212.09561 [paper]
Reflexion: Language Agents with Verbal Reinforcement Learning, arxiv.2303.11366 [paper]
Self-Refine: Iterative Refinement with Self-Feedback, arxiv.2303.17651 [paper]
REFINER: Reasoning Feedback on Intermediate Representations, arxiv.2304.01940 [paper]
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought, arxiv.2305.11499 [paper]
Deductive Verification of Chain-of-Thought Reasoning, arxiv.2306.03872 [paper]
Forward-Backward Reasoning in Large Language Models for Verification, arxiv.2308.07758 [paper]

Question Decomposition

Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]
Iteratively Prompt Pre-trained Language Models for Chain of Thought, EMNLP 2022 [paper]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, ICLR 2023 [paper]
Decomposed Prompting: A Modular Approach for Solving Complex Tasks, ICLR 2023 [paper]
Binding Language Models in Symbolic Languages, ICLR 2023 [paper]
Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning, SIGIR 2023 [paper]

External Knowledge

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models, arxiv.2305.06575 [paper]
MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts, arxiv.2305.05181 [paper]
Chain of Knowledge: A Framework for Grounding Large Language Models with Structured Knowledge Bases, arxiv.2305.13269 [paper]
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arxiv.2306.06427 [paper]
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering, arxiv.2308.13259 [paper]

Vote and Rank

Training Verifiers to Solve Math Word Problems, arxiv.2110.14168 [paper]
Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023 [paper]
Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]
Answering Questions by Meta-Reasoning over Multiple Chains of Thought, arxiv.2304.13007 [paper]
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning, arxiv.2308.00436 [paper]

Efficiency

Active Prompting with Chain-of-Thought for Large Language Models, arxiv.2302.12246 [paper]
Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs, arxiv.2305.11860 [paper]
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arxiv.2307.15337 [paper]

🛸 Frontier Application

Tool Using

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning, arxiv.2205.00445 [paper]
TALM: Tool Augmented Language Models, arxiv.2205.12255 [paper]
ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023 [paper]
Toolformer: Language Models Can Teach Themselves to Use Tools, arxiv.2302.04761 [paper]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, arxiv.2303.17580 [paper]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, arxiv.2303.11381 [paper]
API-Bank: A Benchmark for Tool-Augmented LLMs, arxiv.2304.08244 [paper]

Planning

Reflexion: Language Agents with Verbal Reinforcement Learning, arxiv.2303.11366 [paper]
Self-Refine: Iterative Refinement with Self-Feedback, arxiv.2303.17651 [paper]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arxiv.2304.11477 [paper]
Large Language Model Guided Tree-of-Thought, arxiv.2305.08291 [paper]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.2305.10601 [paper]
Reasoning with Language Model is Planning with World Model, arxiv.2305.14992 [paper]
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arxiv.2308.09687 [paper]
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arxiv.2308.08614 [paper]
Dynamic Planning with a LLM, arxiv.2308.06391 [paper]

Distillation

STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]
Large Language Models Can Self-Improve, arxiv.2210.11610 [paper]
Teaching Small Language Models to Reason, ACL 2023 [paper]
Large Language Models Are Reasoning Teachers, ACL 2023 [paper]
Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think, ACL 2023 [[paper](https://doi.org/10.18653/v1/2023.acl-long.150]
SCOTT: Self-Consistent Chain-of-Thought Distillation, ACL 2023 [paper]
Specializing Smaller Language Models towards Multi-Step Reasoning, ICML 2023 [paper]
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, arxiv.2305.02301 [paper]
Contrastive Decoding: Open-ended Text Generation as Optimizatio, ACL 2023 [paper]
Contrastive Decoding Improves Reasoning in Large Language Models, arxiv.2309.09117 [paper]

🔭 Future Prospect

Multimodal Chain-of-Thought Reasoning in Language Models, arxiv.2302.00923 [paper]
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models, arxiv.2305.16582 [paper]
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering, arxiv.2305.03453 [paper]
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals, arxiv.2308.06207 [paper]
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning, arxiv.2308.0965 [paper]

Faithful XoT

Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]
Rethinking with Retrieval: Faithful Large Language Model Inference, arxiv.2301.00303 [paper]
Faithful Chain-of-Thought Reasoning, arxiv.2301.13379 [paper]
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arxiv.2306.06427 [paper]
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning, arxiv.2307.11768 [paper]
Measuring Faithfulness in Chain-of-Thought Reasoning, arxiv.2307.13702 [paper]

CoT Theory

Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango, arxiv.2209.07686 [paper]
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters, ACL 2023 [paper]
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners, arxiv.2305.14825 [paper]
Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs, arxiv.2305.18869 [paper]
Towards Revealing the Mystery behind Chain of Thought: a TheoreticalPerspective, arxiv.2305.15408 [paper]
Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions, arxiv.2307.13339 [paper]

🚢 Other works

The Unreliability of Explanations in Few-Shot In-Context Learning, arxiv.2205.03401 [paper]
A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams, arxiv.2206.05442 [paper]
Rationale-Augmented Ensembles in Language Models, arxiv.2207.00747 [paper]
Can language models learn from explanations in context?, EMNLP 2022 [paper]
Inferring Implicit Relations in Complex Questions with Language Models, EMNLP 2022 [paper]
Language Models of Code are Few-Shot Commonsense Learners, EMNLP 2022 [paper]
Solving Quantitative Reasoning Problems with Language Models, NeurIPS 2022 [paper]
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding, SIGKDD 2022 [paper]
Large Language Models are few(1)-shot Table Reasoners, EACL 2023 [paper]
Reasoning Implicit Sentiment with Chain-of-Thought Prompting, ACL 2023 [paper]
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method, ACL 2023 [paper]
Tab-CoT: Zero-shot Tabular Chain of Thought, ACL 2023 [paper]
Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models, ACL 2023 [paper]
Language models are multilingual chain-of-thought reasoners, ICLR 2023 [paper]
Ask Me Anything: A simple strategy for prompting language models, ICLR 2023 [paper]
Large Language Models Can Be Easily Distracted by Irrelevant Context, ICLR 2023 [paper]

进技术交流群请添加AINLP小助手微信（id: ainlp2)

请备注具体方向+所用到的相关技术点

![](https://api.allorigins.win/raw?url=https://mmbiz.qpic.cn/mmbiz_jpg/nW2ZPfuYqSJADkmZ2IX6Z23znAibuEevotDMq9iaMxiapK7jfMibiauGFkycicAJEs6x5U9SGyDJZ0S1tRed9TPNUUDQ/640?wx_fmt=jpeg&wxfrom=5&wx_lazy=1&wx_co=1)

关于AINLP

AINLP 是一个有趣有AI的自然语言处理社区，专注于 AI、NLP、机器学习、深度学习、推荐算法等相关技术的分享，主题包括LLM、预训练模型、自动生成、文本摘要、智能问答、聊天机器人、机器翻译、知识图谱、推荐系统、计算广告、招聘信息、求职经验分享等，欢迎关注！加技术交流群请添加AINLP小助手微信(id：ainlp2)，备注工作/研究方向+加群目的。

  


![](https://api.allorigins.win/raw?url=https://mmbiz.qpic.cn/mmbiz_jpg/nW2ZPfuYqSKABHCqVVQkVYPrM4XY1vsd0iaeuXzyJnoFc8cibd5mYb4wdA3WMQtiaPVmr0XLZHMuVibqWncibpnTSnQ/640?wx_fmt=jpeg&wxfrom=5&wx_lazy=1&wx_co=1)

阅读至此了，分享、点赞、在看三选一吧🙏

更多AI工具，参考Github-AiBard123，国内AiBard123

可关注我们的公众号：每天AI新工具

🎁 Resources

Surveys

Blogs

💯 Benchmarks

Mathematical Reasoning

Commonsense Reasoning

Symbolic Reasoning

Logical Reasoning

Multi-modal Reasoning

Visual-Language (Image)

Video-Language

🚀 Advances

XoT Construction

Manual Construction

Automatic Construction

Semi-Automatic Construction

XoT Structural Variants

Chain Structure

Tree Structure

Graph Structure

XoT Enhancement Methods

Verify and Refine

Question Decomposition

External Knowledge

Vote and Rank

Efficiency

🛸 Frontier Application

Tool Using

Planning

Distillation

🔭 Future Prospect

Multi-modal XoT

Faithful XoT

CoT Theory

🚢 Other works