Quickly access the latest research papers on large language models from arXiv.
Papers and research related to Large Language Model
This research introduces a novel approach for embodied agents to interpret ambiguous human instructions by fine-tuning multimodal large language models (MLLMs) as vision-language-action policies using reinforcement learning, significantly improving task execution.
This research investigates the ability of large language models (LLMs) to diagnose students' cognitive skills in math problem-solving, revealing significant challenges and limitations in their performance.
The paper introduces a cost-efficient strategy for enhancing Large Language Model (LLM) performance by utilizing a repeated-sampling-then-voting framework with multiple models, optimizing test-time compute.
This research explores the integration of personality traits into LLM-based autonomous agents, demonstrating how these traits influence decision-making and task selection processes in cyber defense applications.
The paper presents a novel approach using Large Language Models (LLMs) to enhance the search for deletion-correcting codes, integrating AI techniques with information theory principles.
This paper presents a collaborative framework for numerical reasoning that enhances local model performance while ensuring data protection, addressing the challenges of computation-constrained devices.
This research paper evaluates the numerical reasoning abilities of Large Language Models (LLMs) through a test called "Numberland," revealing significant limitations in their basic mathematical skills despite their advanced capabilities.
This survey explores the role of Large Language Models (LLMs) in enhancing Explainable AI (XAI) by transforming complex outputs into understandable narratives, addressing transparency issues in AI models.
This paper evaluates the efficiency of Self-Consistency (SC) versus Generative Reward Models (GenRM) in enhancing reasoning capabilities of large language models (LLMs) under fixed inference budgets, revealing SC's superior compute efficiency.
This paper investigates the structure of token embeddings in large language models (LLMs), revealing that they often violate the manifold hypothesis, which can lead to flawed interpretations of LLM behavior.
The paper introduces Zero-shot Benchmarking (ZSB), a framework for creating scalable and flexible benchmarks for evaluating language models across various tasks and languages, enhancing automatic evaluation methods.
MedReason introduces a large-scale dataset aimed at enhancing medical reasoning in LLMs through structured knowledge graphs, providing detailed, verifiable reasoning paths for medical problem-solving.