arXiv - Artificial Intelligence·
6h ago
This research investigates the application of Large Language Models (LLMs) in clinical triage, focusing on their robustness and intersectional biases related to sex and race.
- The study evaluates LLMs' performance in emergency department triage, highlighting their robustness to distribution shifts and missing data.
- It identifies significant intersectional biases, revealing that LLMs exhibit varying preferences based on sex and race, particularly in specific demographic combinations.
- The research suggests that while LLMs show promise in clinical decision support, they also encode demographic biases that could impact their effectiveness in diverse clinical contexts.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
OptimAI is a framework that utilizes LLM-powered AI agents to convert natural language optimization problems into mathematical formulations, significantly enhancing problem-solving efficiency and accuracy in scientific research.
- The framework consists of four roles: a formulator for translating natural language to math, a planner for strategy construction, and a coder and code critic for interaction and refinement.
- Removing key roles like the planner or code critic leads to significant drops in productivity, highlighting their importance in the optimization process.
- OptimAI achieves high accuracy on benchmark datasets, outperforming previous methods and demonstrating a 58% reduction in error rates, showcasing its effectiveness in optimization tasks.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This paper introduces COT Fine-tuned, a framework for detecting AI-generated text and identifying the specific language model responsible, utilizing Chain-of-Thought reasoning for enhanced interpretability.
- COT Fine-tuned employs a dual-task approach: classifying text as AI-generated or human-written and identifying the specific LLM behind the text.
- The use of Chain-of-Thought reasoning allows the model to provide explanations for its predictions, improving transparency and interpretability in AI text detection.
- Experiments show that COT Fine-tuned achieves high accuracy in both classification and LLM identification, demonstrating the effectiveness of the CoT reasoning process.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This study investigates the effectiveness of generative large language models (LLMs) in requirements classification, revealing insights on prompt design and model architecture's impact on performance across various datasets.
- The research evaluates three generative LLMs—Bloom, Gemma, and Llama—across binary and multi-class requirements classification tasks, conducting over 400 experiments.
- Findings indicate that while prompt design and LLM architecture are crucial, dataset variations significantly influence performance based on task complexity.
- The study emphasizes the need for optimizing prompt structures and aligning model architectures with specific classification tasks to enhance effectiveness in requirements engineering.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
The paper presents HEMA, a dual-memory architecture designed to enhance coherence in long-context AI conversations, significantly improving factual recall and dialogue continuity beyond 300 turns.
- HEMA integrates Compact Memory for narrative coherence and Vector Memory for episodic storage, achieving a factual recall accuracy increase from 41% to 87%.
- The architecture maintains dialogue coherence with a 6B-parameter transformer, allowing conversations to exceed 1,000 turns while keeping prompt lengths manageable.
- Key findings include a 34% reduction in retrieval latency through age-weighted pruning and a two-level summary hierarchy that prevents errors in ultra-long dialogues, demonstrating practical applications for conversational AI.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
The paper introduces V2R-Bench, a benchmark framework assessing the robustness of Large Vision Language Models (LVLMs) to visual variations, revealing significant vulnerabilities in their performance on simple tasks despite their advanced capabilities.
- V2R-Bench evaluates LVLMs against visual variations in position, scale, orientation, and context, highlighting their unexpected weaknesses in basic object recognition tasks.
- The study uncovers a visual position bias in LVLMs that contradicts existing theories, indicating a need for improved multimodal alignment and architectural innovations.
- Systematic analysis reveals that vulnerabilities arise from error accumulation in the architecture, emphasizing the importance of addressing these deficiencies in future LVLM designs.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This study investigates the dynamics of cross-lingual transfer (CLT) in large language models fine-tuned on multilingual data, revealing complex interactions that influence multilingual performance across various tasks.
- The research examines two model families with up to 35B parameters, focusing on their performance in summarization, instruction following, and mathematical reasoning tasks.
- Findings indicate that CLT dynamics are influenced by the combination of post-training settings rather than isolated variables, highlighting the complexity of multilingual training.
- The study identifies specific conditions that enhance effective cross-lingual transfer, providing insights for optimizing multilingual model training in practical applications.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This research paper investigates the effectiveness of Large Language Models (LLMs) in generating counterspeech to combat conspiracy theories, revealing significant limitations in their output quality and reliability.
- The study highlights the lack of datasets pairing conspiracy theory comments with expert counterspeech, which hampers effective LLM training for this purpose.
- Evaluating models like GPT-4o, Llama 3, and Mistral, the research finds that generated responses are often generic, repetitive, and superficial, failing to engage effectively with conspiracy theories.
- The models tend to over-acknowledge fear and frequently produce hallucinated facts, raising concerns about their practical application in counterspeech strategies against harmful online content.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This pilot study compares the translation effectiveness of large language models (LLMs) and traditional machine translation (MT) tools for medical consultation summaries in multiple languages, revealing strengths and weaknesses in both approaches.
- Traditional MT tools outperformed LLMs in translating complex medical texts, while LLMs showed better results for simpler summaries, particularly in Vietnamese and Chinese.
- Arabic translations improved with text complexity due to the language's unique morphology, indicating that LLMs may require more domain-specific training.
- The study emphasizes the need for improved evaluation metrics that capture clinical relevance and suggests incorporating human oversight in medical translations to enhance accuracy.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This case study investigates the fine-tuning of Small Language Models (SLMs) for detecting Common Weakness Enumerations (CWEs) in Python code, demonstrating their effectiveness as a privacy-preserving alternative to Large Language Models (LLMs).
- The study focuses on a 350-million parameter pre-trained code model, codegen-mono, which was fine-tuned to detect the MITRE Top 25 CWEs in Python, achieving approximately 99% accuracy.
- A semi-supervised approach was used to create a targeted dataset of 500 examples, combining LLM-driven synthetic data generation with human review to enhance training quality.
- The fine-tuned SLM demonstrated exceptional performance metrics, including 98.08% precision, 100% recall, and a 99.04% F1-score, indicating its potential as a practical tool for secure code analysis without compromising privacy.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This research investigates how large language model (LLM)-powered conversational UIs in shared autonomous vehicles (SAVs) influence user perceptions and experiences through psychological ownership and anthropomorphism.
- The study designed four SAV UIs with varying anthropomorphic features to assess their impact on users' psychological ownership and acceptance of the technology.
- Results showed that more anthropomorphic UIs enhanced users' perceptions of human-like qualities and improved sentiment towards SAV responses compared to less anthropomorphic designs.
- The findings offer practical insights for developing LLM-based conversational UIs that can effectively boost user experience and encourage the adoption of SAVs.
linkCopy linkshare_windowsShare
arXiv - Artificial Intelligence·
6h ago
This paper investigates the vulnerabilities of Multi-Agent Debate (MAD) frameworks using Large Language Models (LLMs), revealing significant susceptibility to jailbreak attacks that can elicit harmful content.
- The study focuses on four prominent MAD frameworks built on leading LLMs, highlighting their increased vulnerability compared to single-agent systems.
- A novel structured prompt-rewriting framework is introduced, exploiting MAD dynamics to amplify harmful content generation, achieving success rates of up to 80% in certain scenarios.
- The findings emphasize the urgent need for robust defenses against these vulnerabilities before deploying MAD systems in real-world applications.
linkCopy linkshare_windowsShare
arXiv - Machine Learning·
6h ago
This research paper investigates how Large Language Models (LLMs) capture and represent domain-specific knowledge, revealing their ability to distinguish between queries from different domains and the robustness of these representations.
- The study examines LLMs' sensitivity to domain-specific nuances by analyzing hidden states during the prefill phase, uncovering latent domain-related trajectories.
- Findings indicate that LLMs can effectively differentiate queries across related domains, challenging the assumption that fine-tuned models are always the most accurate.
- The research also explores the impact of prompt styles and sources on domain representation, providing insights for model selection based on input query traces.
linkCopy linkshare_windowsShare
arXiv - Machine Learning·
6h ago
The paper 'ParetoHqD' presents a novel approach for aligning large language models with diverse human preferences using offline multiobjective alignment algorithms, enhancing performance through high-quality data representation.
- ParetoHqD improves alignment by representing human preferences as directions in the objective space, addressing limitations of previous algorithms.
- The method employs a two-stage supervised fine-tuning process, utilizing Pareto high-quality training sets tailored to specific preference directions.
- Experimental results show ParetoHqD outperforms five baseline methods in two multiobjective alignment tasks, demonstrating its effectiveness in aligning LLMs with user expectations.
linkCopy linkshare_windowsShare